LYNNE LOOMIS 


ADVANCED, 
CALCUIAIS 


LYNN H.LOOMIS and SHLOMO STERNBERG 


Department of Mathematics, Harvard University 


ADVANCED CALCULUS 


REVISED EDITION 


JONES AND BARTLETT PURLISHERS 


Rostou Loudon 


PREFACE 


This book is based on an honors course in advanced calculus that we gave in the 
1960’s. The foundational material, presented in the unstarred sections of Chap- 
ters 1 through 11, was normally covered, but different applications of this basic 
material were stressed from year to year, and the bock therefore contains more 
material than was covered in any one year. It can accordingly be used (with 
omissions) as a text for a year’s course in advanced calculus, or as a text for a 
three-semester introduction to analysis. 

These prerequisites are a good grounding in the calculus of one variable 
from a mathematically rigorous point of view, together with some acquaintance 
with linear algebra. The reader should be familiar with limit and continuity type 
arguments and have a certain amount of mathematical sophistication. As possi- 
ble introductory texts, we mention Differential and Integral Calculus by R. Cou- 
rant, Caleulus by T. Apostol, Calculus by M. Spivak, and Pure Mathematics by 
G. Hardy. The reader should also have some experience with partial derivatives. 

Tn overall plan the book divides roughly into a first half which develops the 
caleulus (principally the differential calculus) in the setting of normed vector 
spaces, and a second half which deals with the caleulus of differentiable manifolds. 

Vector space calculus is treated in two chapters, the differential caleulus in 
Chapter 3, and the basic theory of ordinary differential equations in Chapter 6. 
The other early chapters are auxiliary. The first two chapters develop the neces- 
sary purely algebraic theory of vector spaces, Chapter 4 presents the material 
on compactness and completeness needed for the more substantive results of 
the ealeulus, and Chapter 5 contains a brief account of the extra structure en- 
countered in scalar product spaces. Chapter 7 is devoted to multilinear {tensor} 
algebra and is, in the main, a reference chapter for later use. Chapter 8 deals 
with the theory of (Riemann) integration on Euclidean spaces and includes (in 
exercise form) the fundamental facts about the Fourier transform. Chapters 9 
and 10 develop the differential and integral calculus on manifolds, while Chapter 
11 treats the exterior calculus of E. Cartan. 

The first eleven chapters form a logical unit, each chapter depending on the 
results of the preceding chapters. (Of course, many chapters contain material 
that can be omitted on first reading; this is generally found in starred sections.) 


On the other hand, Chapters 12, 13, and the latter parts of Chapters 6 and I1 
are independent of each other, and are to be regarded as illustrative applications 
of the methods developed in the earlier chapters. Presented here are elementary 
Sturm-Liouville theory and lourier series, elementary differential geometry, 
potential theory, and classical mechanics. We usually covered only one or two 
of these topies in our one-year course. 

We have not hesitated to present the same material more than once from 
different points of view. For example, although we have selected the contraction 
mapping fixed-point theorem as our basic approach to the implicit-function 
theorem, we have also outlined a “Newton's method” proof in the text and have 
sketched still a third proof in the exercises. Similarly, the calculus of variations 
is encountered twice—onee in the context of the differential calculus of an 
infinite-dimensional vector space and later in the context of classical] mechanics. 
The notion of a submanifold of a vector space is introduced in the early chapters, 
while the invariant definition of a manifold is given later on. 

In the introductory treatment of vector space theory, we are more careful 
and precise than is customary. In fact, this level of precision of language is not 
maintained in the later chapters. Our feeling is that in linear algebra, where the 
concepts are so clear and the axioms so familiar, it is pedagogically sound to 
illustrate various subtle points, such as distinguishing between spaces that are 
normally identified, discussing the naturality of various maps, and so on. Later 
on, when overly precise language would be more cumbersome, the reader should 
be able to produce for himself a more precise version of any assertions that he 
finds to be formulated too loosely. Similarly, the proofs in the first few chapters 
are presented in more formal detail. Again, the philosophy is that once the 
student has mastered the notion of what constitutes a formal mathematical 
proof, it is safe and more convenient to present arguments in the usual mathe- 
matical colloquialisms. 

While the levet of formality decreases, the level of mathematical sophisti- 
cation does not. Thus increasingly abstract and sophisticated mathematical 
objects are introduced. It has been our experience that Chapter 9 contains the 
concepts most difficult for students to absorb, especially the notions of the 
tangent space to a manifold and the Lie derivative of various objects with 
respect to a vector field. 


There are exercises of many different kinds spread throughout the book. 
Some are in the nature of routine applications. Others ask the reader to fill in 
or extend various proofs of results presented in the text. Sometimes whole 
topies, such as the Fourier transform or the residue calculus, are presented in 
exercise form. Due to the rather abstract nature of the textual material, the stu- 
dent is strongly advised to work out as many of the exercises as he possibly can. 

Any enterprise of this nature owes much to many people besides the authors, 
but we particularly wish to acknowledge the help of L. Ahlfors, A. Gleason, 
R. Kulkami, R. Rasala, and G. Mackey and the general influence of the book by 
Dieudonné. We also wish to thank the staff of Jones and Bartlett for their invaluable 
help in preparing this revised edition. 


Cambridge, Massachusetts L.H.L. 
1968, 1989 8.5. 


Chapter 0 


COO MN Deb We 


“TR Ob Wwe 


+ 


Chapter 3 


CONTENTS 


Introduction 


Logic: quantifiers 

The logical connectives . 
Negations of quantifiers 
Sets : 
Restricted variables ; 
Ordered pairs and relations . 
Functions and mappings 
Product sets; index notation 
Composition 

Duality 

The Boolean operations . 
Partitions and equivalence relations 


Vector Spaces 


Fundamental notions 

Vector spaces and geometry 

Preduet spaces and Hom({Y, 1) 

Affine subspaces and quotient spaces . 
Direet sums 

Bilinearity 


Finite-Dimensional Vector Spaces 


Bases 

Dimension 

The dual space 

Matrices ; 
Trace and deter walnand ; 
Matrix computations 


The diagonalization of a quadratic ‘form : 


The Differential Calculus 


Review in R 
Norms . 
Continuity 


Equivalent norms 
Infinitesimals . 
The differential 


Directional derivatives; the theanevalue eheareia. 


The differential and product spaces 
The differential and R” . er 
Elementary applications 

The implicit-function theorem . 
Submanifolds and Lagrange multipliers 
Functional dependence . 


Uniform continuity and function-valued mappings ; 


The calculus of variations 


The second differential and the classification of eritical points 


The Taylor formula . 


Compactness and Completeness 


Metric spaces; open and closed sets 
Topology . ; 
Sequential convergence ‘ 
Sequential compactness . 


Compactness and uniformity . . . . . 


Equicontinuity 

Completeness . 

A first look at Banach algebras 

The contraction mapping fixed-point dhisorem 
The integral of a parametrized arc 

The complex number system 

Weak methods 


Sealar Product Spaces 


Scalar products 

Orthogonal projection 
Self-adjoint transformations 
Orthogonal transformations 
Compact transformations 


132 
136 
140 
146 
152 
156 
161 
164 
172 
175 
179 
182 
186 
191 


195 
201 
202 
205 
210 
215 
216 
223 
228 
236 
240 
245 


248 
252 
257 
262 
264 


Chapter 6 


NO WN 


Chapter 7 


oan aut Whe 


Chapter 8 


OC NT AAR & ho 


Differential Equations 


The fundamental theorem 


Differentiable dependence on parameters s 


The linear equation . : 
The nth-order linear equation . 
Solving the inhomogeneous equation 
The boundary-value problem 
Fourier series . 


Multilinear Functionals 


Bilinear functionals . 

Multilinear functionals . 
Permutations . 

The sign of a parmijtation 

The subspace @” of alternating tensors 
The determinant . 

The exterior algebra . 


Exterior powers of scalar product spaces - 


The star operator 


Integration 


Introduction 

Axioms : : 
Rectangles and paved sets , 

The minimal theory . 

The minimal theory (continued) 
Contented sets 

When is a set contented? 
Behavior under linear distortions 
Axioms for integration 

Integration of contented functions 
The change of variables formula 
Successive integration 

Absolutely integrable functions 
Problem set: The Fourier transform 


266 
274 
276 
281 
288 
204 
301 


305 
306 
308 
309 
310 
312 
316 
319 
320 


321 
322 
324 
327 
328 
331 
333 
335 
336 
338 
342 
346 
351 
355 


Chapter 9 


Contour SNe 


Chapter 10 


Tank ote 


Chapter 11 


Aaak wre 


Chapter 12 


mW NS 


Differentiable Manifolds 


Atlases » 
Functions, convergence : 
Differentiable manifolds 

The tangent space 

Flows and vector fields . 

Lie derivatives 

Linear differential fornia 
Computations with coordinates 
Riemann metrics . 


The Integral Caleulus on Manifolds 


Compactness . 

Partitions of unity 

Densities . . 
Volume density of a Riemann metric ; 
Pullback and Lie derivatives of densities . 
The divergence theorem 

More complicated domains . 


Exterior Calculus 


Exterior differential forms 

Oriented manifolds and the integration of exterior differential forms 
The operator d 

Stokes’ theorem : 

Some illustrations of Stokes’ theorem : 

The Lie derivative of a differential form . 

Appendix I. “Vector analysis” ; 
Appendix II. Elementary differential geometry of surfaces 1 in Ee 


Potential Theory in E” 


Solid angle 

Green’s formulas . 

The maximum principle 
Green’s functions 


364 
367 
369 
373 
376 
383 
390 
393 
397 


403 
405 
408 
411 
416 
419 
424 


429 
433 
438 
442 
449 
452 
457 
459 


474 
476 
477 
479 


Chapter 13 


oodlnukt Oye 


The Poisson integral formula 

Consequences of the Poisson integral formula 
Harnack’s theorem 

Subharmonie functions . 

Dirichlet’s problem 

Behavior near the boundary 

Dirichlet’s principle . 

Physical applications ; 
Problem set: The calculus of residues , 


Classical Mechanics 


The tangent and cotangent bundles 
Equations of variation 


The fundamental! linear differential form: on Tt (M) 


The fundamental! exterior two-form on T*(M) 
Hamiltonian mechanies . 
The ecentral-foree problem 
The two-body problem . 
Lagrange’s equations 
Variational! principles 
Geodesic coordinates 

Euler’s equations 

Rigid-body motion 

Small oscillations 

Small oscillations (continued) 
Canonical transformations . 


Selected Keferences . 


Notation Index 


Index . 


CHAPTER 0 


INTRODUCTION 


This preliminary chapter contains a short exposition of the set theory that 
forms the substratum of mathematical thinking today. It begins with a brief 
discussion of logic, so that set theory can be discussed with some precision, and 
continues with a review of the way in which mathematical objects can be defined 
as sets. The chapter ends with four sections which treat specific set-theoretic 
topics. 

It is intended that this material be used mainly for reference. Some of it 
will be familiar to the reader and some of it will probably be new. We suggest 
that he read the chapter through “lightly” at first, and then refer back to it 
for details as needed. 


1. LOGIC: QUANTIFIERS 


A statement is a sentence which is true or false as it stands. Thus ‘1 < 2’ and 
‘4-- 3 = 8’ are, respectively, true and false mathematical statements. Many 
sentences occurring in mathematics contain variables and are therefore not true 
or false as they stand, but become statements when the variables are given 
values. Simple examples are ‘x < 4’, ‘x < 7’, ‘x is an integer’, ‘3x? + y? = 10’. 
Such sentences will be called siatement frames, If P(x) is a frame containing the 
one variable ‘2’, then P(5) is the statement obtained by replacing ‘2’ in P(x) by 
the numeral ‘5’. For example, if P(z) is ‘x < 4’, then P(5) is ‘5 < 4’, P(\/2)} 
is ‘2 < 4’, and so on. 

Another way to obtain a statement from the frame P(z) is to assert that P(x} 
is always true. We do this by prefixing the phrase ‘for every z’. Thus, ‘for every 
x, 2 < 4’ is a false statement, and ‘for every 2, x? — 1 = ( — I)(a +L)’ isa 
true statement. This prefixing phrase is called a universal quantifier, Syn- 
onymous phrases are ‘for each x’ and ‘for all 2’, and the symbol customarily 
used is ‘(¥x)’, which can be read in any of these ways. One frequently presents 
sentences containing variables as being always true without explicitly writing 
the universal quantifiers. For instance, the associative law for the addition of 
numbers is often written 


et y+te2)=(@t+y)t+z, 


where it is understood that the equation is true for all x, y and z Thus the 
i 


2 INTRODUCTION 0.1 


aclual statement being made is 
(Va) (Vy) Walla + y +z) = f@+y) 4+ 2. 


Finally, we can convert the frame P(x) into a statement by asserting that 
il is sometimes true, which we do by writing ‘there exisis an x such that P(z)’. 
This process is called existential quantification, Synonymous prefixing phrases 
here are ‘there is an x such that’, ‘for some 2’, and, symbolically, ‘(3s)’. 

The statement ‘(V¥r)(x < 4)’ still contains the variable ‘2’, of course, but 
z’ is no longer free to be given values, and is now called a bound variable. 
Roughly speaking, quantified variables are bound and unquantified variables 
are free. The notation ‘P(x)’ is used only when ‘2’ is free in the seutence being 
discussed. 

Now suppose that we have a sentence J’{z, y) containing éwe free variables. 
Clearly, we need two quantifiers to obtain a statement from this sentence. 
This brings us to a very important observation. Jf guantefiers of both types are 
used, then the order in which they are wriiten affects the meaning of the statement; 
(Jy) (V2) P(x, y) and (Vx) (3y)P (2, y) say different things. The first says that one y 
can be iound that works for all z: “there exists a y such that for all @...”. 
The second says that for each « a y can be found that works: “for each x there 
exists a y such that...”. But in the second case, it may very well happen that 
when @ is changed, the y that can be found will also have to be changed. The 
existence of a single y that serves for all z is thus the stronger statement. For 
exainple, it is true that (Vc)(Gy)(@ < y) and false that (Gy)(va)(a < y). The 
reader must be absolutely clear on this point; his whole mathematical future is 
al stake. The second statement says that there exists a y, call it yo, such that 
(¥z)(x < vo), that is, such that every number is less than yo. This is false; 
Yo + 1, in particular, is not less than ¥9. The first statement says that for each x 
we can find a corresponding y. And we can: take y = «+1. 

On the other hand, among a group of quantifiers of the same type the order 
does not affect the meaning. Thus ‘(¥z)(Vy)’ and ‘(¥y)(V¥Vz)' have the same mean- 
ing. We often abbreviate such clumps of similar quantifiers by using the quan- 
tification symbol only once, as in ‘(V¥x, y)’, which can be read ‘for every z and y’. 
Thus the strictly correct ‘(vx) (Vy) (Vz) [2 + (y+ 2) = (@ + y) + 2)’ receives the 
slightly more idiomatie rendition ‘(¥z, y, z)[¢ + (y +2) = (a +y) +2]. The 
situation is clearly the same for a group of existential quantifiers. 

The beginning student generally feels that the prefixing phrases ‘for every x 
there exists a y such that’ and ‘there exists a y such that for every 2’ sound 
artificial and are unidiomatie. This is indeed the case, but this awkwardness is the 
price that has to be paid for the order of the quantifiers to be fixed, so that the 
meaning of the quantified statement is clear and unambiguous. Quantifiers do 
occur in ordinary idiomatic discourse, but their idiomatic occurrences often 
house ambiguity. The following two sentences are good examples of such 
ambiguous idiomatic usage: “Every x is less than some y” and “Some y is greater 
than every 7”. If a poll were taken, it would be found that most men on the 


ioeed 


0.2 THE LOGICAL CONNECTIVES 3 


street feel that these two sentences say the same thing, but half will feet that the 
common assertion is false and half will think it true! The trouble here is that 
the matrix is preceded by one quantifier and followed by another, and the poor 
reader doesn’t know which ta take as the inside, or first applied, quantifier. The 
twa possible symbolic renditions of our first sentence, ‘((Wx)(e < y)](3y)’ and 
“(Wx)[(e < y)(Jy)!, are respectively false and true. Mathematicians do use 
hanging quantifiers in the interests of more idiomatic writing, but only if they 
are sure the reader will understand their order of application, either fram the 
context or by comparison with standard usage. In general, a hanging quantifier 
would probably be read as the inside, or first applied, quantifier, and with this 
understanding our two ambiguous sentences become true and false in that order. 

After this apology the reader should be able to tolerate the definition of 
sequential convergence. It involves three quantifiers and runs as follows: The 
sequence {2,} converges to « if (We)(AN)(Wn)Gf nx > N then |z, — 2] < 6). 
In exactly the same format, we define a funetion f to be continuous at a if 
(ve)(38)(Vx)(if |x — al < 6 then |f(x) — f{a)| < €). We often omit an inside 
universal quantifier by displaying the final frame, so that the universal quanti- 
fication is understood. Thus we define f to be continuous at a if for every ¢€ 
there is a 6 such that 


if je —al < 6, then (f(x) — ff@){ < «. 


We shall study these definitions later. We remark only that it is perfectly 
possible to build up an intuitive understanding of whet these and similar 
quantified statements actually say. 


2. TITE LOGICAL CONNECTIVES 


When the word ‘and’ is inserted between two sentences, the resulting sentence 
is true if both constituent sentences are true and is false otherwise. That is, the 
“truth value”, T or I, of the compound sentenee depends only on the truth 
values of the constituent sentenees. We can thus deseribe the way ‘and’ acts in 
compounding sentences in the simple “truth table” 


P Q Pand@Q 
T T T 
- ty i 
F T Fr 
F ¥ F 


where ‘P’ and ‘Q’ stand for arbitrary statement frames. Words like ‘and’ are 
called logical connectives. It is often convenient to use symbols for connectives, 
and a standard symbol for ‘and’ is the ampersand ‘&’. Thus ‘P & Q’ is read 
‘P and Q’. 


4 INTRODUCTION 0,2 


Another logical connective is the word ‘or’. Unfortunately, this word is used 
ambiguously in ordinary discourse. Sometimes it is used in the exclusive sense, 
where ‘P or Q’ means that one of P and @ is true, but not both, and sometimes 
it is used in the iclusive sense that at least one is true, and possibly both are 
true. Mathematics cannot tolerate any fundamental ambiguity, and in mathe- 
matics ‘or’ is always used in the latter way. We thus have the truth table 


PQ. Por@ 
a 3 T 
t + T 
FT T 
F F > 


The above two connectives are binary, in the sense that they combine éwa 
sentences to form one new sentence. The word ‘not’ applies to one sentence and 
really shouldn’t be considered a connective at all; nevertheless, it is called a 
unary connective. A standard symbol for ‘not’ is‘~’. Its truth table is obviously 


P ~P 
T r 
IF T 


In idiomatic usage the word ‘not’ is gencrally buried in the interior of a 
sentence. We write ‘% is not equal to y’ rather than ‘not (x is cqual to y)’. 
However, for the purpose of logical manipulation, the negation sign {the word 
‘not’ or a symbol like ‘~’) precedes the sentence being negated. We shall, of 
course, continue to write ‘x ~ y’, but keep in mind that this is idiomatic for 
‘not (2 = y) or ‘~(r = y)’. 

We comc now to the troublesome ‘if..., then...’ connective, whieh we 
write as either ‘if P, then Q’ or ‘P > Q’. This is almost always applied in the 
universally quantified context (¥x)(P(z) = Q(z}), and its meaning is best 
unraveled by a study of this usage. We consider ‘if x < 3, then 2 < 5’ to bea 
true sentence. More exactly, it is true for all x, so that the universal quantifi- 
cation (Vx)(v¥ < 3— a2 < 5) is a true statement. This conclusion forces us to 
agree that, in particular, ‘2 << 352 < 5, ‘4< 354 <5’, and 6 <3=> 
6 < 5’ are all true statements. The truth table for ‘=’ thus contains the 
values entered below. 


P Q P=Q 
rT tT #T 
T FO - 
FT T 
FO oF 


0.2 THE LOGICAL CONNECTIVES 5 


On the other hand, we consider ‘x < 7 = «a < 5’ to be a false sentence, and 
therefore have to agree that ‘6 < 7 = 6 < 5’ is false. Thus the remaining row 
in the table above gives the value ‘T’ for P = @. 

Combinations of frame variables and logical connectives such as we have 
been considering are called truth-functional forms. We can further combine the 
elementary forms such as ‘P = Q’ and ‘~P’ by connectives to construct com- 
posite forms such as ‘~(P = Q)’ and ‘(P = @) & (Q = P)’. A sentence has a 
given (truth-functional) form if it can be obtained from that form by substitution. 
Thus ‘x < yor~(x < y)’ has the form ‘P or ~P’, since it is obtained from this 
form by substituting the sentence ‘x < y’ for the sentence variable ‘P’. Com- 
posite truth-functional forms have truth tables that can be worked out by 
combining the clementary tables. For example, ‘~(P? = Q)’ has the table below, 
the truth value for the whole form being in the column under the connective 
which is applied last (‘~’ in this example). 


T T F) T 
T F T| F 
F T F| T 
Fr I F] T 


Thus ~(P => Q) is true only when P is true and @ is false. 

A truth-funetional form such as ‘P or (~P)’ which is always true (i.c., has 
only “T’ in the final column of its truth table) is called a faufology or a tautologous 
form. The reader can check that 


(P & (P =Q) = and (P3=Q &@=R)) =(P>R) 


are also tautologous. Indeed, any valid principle of reasoning that does not 
involve quantifiers must be expressed by a tautologous form. 

The ‘if and only if? form ‘P <= Q’, or ‘P if and only if Q’, or ‘P iff Q’, is an 
abbreviation for ‘(P = Q) & (@ = P)’. Its truth table works out to be 


P Q PsQq 


T T T 
Tr F ¥ 
F T ¥F 


F F T 


That is, P = @ is true if P and Q have the same truth values, and is false 
otherwise. 

Two truth-funetional forms A and B are said to be equivalent if (the final 
columns of) their truth tables are the same, and, in view of the table for ‘=’, 
we see that A and B are equivalent if A = B ts tautologous, and conversely. 


6 INTRODUCTION 0.4 


Replacing a sentence obtained by substitution in a form A by the equivalent 
sentence obtained by the same substitutions in an equivalent form B is a device 
much used in logical reasoning. Thus to prove a statement P true, it suffices to 
prove the statement ~P false, since ‘P’ and ‘~(~P)’ are equivalent forms. 
Other important equivalences are 


~(PorQ) = (~P) & (~Q), 
(P>@Q) = Qor (~P), 
~(P>Q) @ P & (~Q). 


A bit of conventional sloppiness which we shall indulge in for smoother 
idiom is the use of ‘if’ instead of the correct ‘if and only if’ in definitions. We 
define f to be continuous at x if so-and-so, meaning, of course, that f is continuous 
at x if and only if so-and-so. This causes no difficulty, since it is clear that ‘if 
and only if’ is meant when a definition is being given. 


3. NEGATIONS OF QUANTIFIERS 


The combinations ‘~(Vx)’ and ‘(ix)~’ have the same meanings: something is 
not always true if and only if it is sometimes false. Similarly, ‘~(3y)’ and ‘(Vy)~’ 
have the same meanings. These equivalences can be applied to move a negation 
sign past each quantifier in a string of quantifiers, giving the following important 
practical rule: 


In iaking the negation of a statement beginning with a string of quantifiers, 
we simply change each quantifier lo the opposite kind and move the negation 
sign to the end of the string. 


Thus 
~ (Wx) (Jy) (W2)P(a, y, 2) <> (Ax) (Wy) (4z)~P(z, y, 2). 


There are other principles of quantificational reasoning that can be isolated 
and which we shall occasionally mention, but none seem worth formalizing here. 


4, SETS 
It is present-day practice to define every mathematical object as a set of some 
kind or other, and we must examine this fundamental notion, however briefly. 

A set isa collection of objects that is itself considered an entity. The objects 
in the collection are called the elemenis or members of the set. The symbol for 
‘is a member of’ is ‘e’ (a sort of capital epsilon), so that ‘x € A’ is read “x isa 
member of A”, “x is an element of A”, “x belongs to A”, or “z isin A”. 

We use the equals sign ‘=’ in mathematics to mean logical identity; A = B 
means that A is B. Now a set A is considered to be the same object as a set B 
if and only if A and B have exactly the same members. That is, ‘4 = B’ means 
that (Va)(2 EA exe B). 


0.4 SETS 7 


We say that a set A is a subset of a set B, or that A is zrcluded in B (or that 
B is @ superset of A) if every clement of 4 is an element of B. The symbol for 
inclusion is ‘Cc’. Thus ‘Ad c B’ means that (¥Va)(x EC A+2xEB). Clearly, 


{A = B) = (A CB) and (BC A). 


This is a frequently used way of establishing set identity: we prove that A = B 
by proving that A C B and that B Cc A. If the reader thinks about the above 
equivalence, he will see that it depends first on the equivalence of the truth-func- 
tional forms ‘P — @Q’ and ‘(P = Q) & (Q = P)’, and then on the obvious 
quantificational equivalence between ‘(vx)(R & SY and ‘(Wa)R & (¥r)S’. 

We define a set by specifying its members. If the set is finite, the members 
can actually be listed, and the notation used is braces surrounding a member- 
ship list. For example {1, 4, 7} is the set containing the three numbers 1, 4, 7, 
{x} is the unzé set of x (the set having only the one object z as a member), 
and {x,y} is the pair set of « and y. We can abuse this notation to name some 
infinite sets. Thus {2, 4, 6, 8,...} would certainly be considered the set of all 
even positive integers. But infinite sets are generally defined by statement 
frames. If P(x) is a frame containing the free variable ‘x’, then {« : P(x)} is the 
set of all z such that P(x) is true. In other words, {x : P(x)} is that set A such 
that 

yeEAos Py). 


For example, {<:2? < 9} is the set of all rea] numbers z such that 2? < 9, 
that is, the open interval (—3, 3), andy & {x:27 < 9} @ y? <9. A statement 
frame P(x) can be thought of as stating a property that an object + may or may 
not have, and {z : P(x)} is the set of all objects having that property. 

We need the empty sef &, in much the same way that we need zero in 
arithmetic. If P(x) is never true, then {7 :P(x)} = @. For example, 


{ferz # a= DS. 


When we said earlier that all mathematical objects are customarily con- 
sidered sets, it was taken for granted that the reader understands the distinction 
between an object and a name of that object. To be on the safe side, we add a 
few words. A chair is not the same thing as the word ‘chair’, and the number 4 
is a mathematical object that is not the same thing as the numeral ‘4’. The 
numeral ‘4’ is a name of the number 4, as also are ‘four’, “2 + 2’, and ‘ITY’. 
According to our present viewpoint, 4 itself is taken to be some specific set. 
There is no need in this course to carry logical analysis this far, but some readers 
may be interested to know that we usually define 4 as {0, 1, 2,3}. Similarly, 
2= {0,1}, 1 = {0}, and 0 is the empty set @. 

Tt should be clear from the above discussion and our exposition thus far 
that we are using a symbol surrounded by single quotation marks as a name of 
that symbol (the symbol itself being a name of something else). Thus ‘ ‘4’’ is a 
name of ‘4’ (which is itself a name of the number 4). This is strictly correct 


8 INTRODUCTION 0.4 


usage, but mathematicians almost universally mishandle it. It is accurate to 
write: let z be the number; call this number ‘2’. However, the latter is almost 
always written: call this number z. This imprecision causes no difficulty to the 
reading mathematician, and it often saves the printed page from a shower of 
quotation marks. There is, however, a potential victim of such ambiguous 
treatment of symbols. This is the person who has never realized that mathe- 
matics is not about symbols but about objects to which the symbols refer. Since 
by now the present reader has safely avoided this pitfall, we can relax and 
occasionally omit the strictly necessary quotation marks. 

In order to avoid overworking the word ‘set’, we use many synonyms, 
such as ‘class’, ‘collection’, ‘family’ and ‘aggregate’. Thus we might say, “Let @ 
bea family of classes of sets”. If a shoe store is a collection of pairs of shoes, then 
a chain of shoe stores is such a three-level object. 


5. RESTRICTED VARIABLES 


A variable used in mathematics is not allowed to take all objects as values; it 
can only take as values the members of a certain set, called the demain of the 
variable. The domain is sometimes explicitly indicated, but is often only im- 
plied. For example, the letter ‘x’ is customarily used to specify an integer, so 
that ‘(vnj)P(ny would automatically be read “for every integer n, P(n)”. How- 
ever, sometimes n is taken to be a positive integer. In case of possible ambiguity 
or doubt, we would indicate the restriction explicitly and write ‘(vn € Z)P(n)’, 
where ‘2’ is the standard symbol for the set of all integers. The quantifier is 
read, literally, “for all x in Z”, and more freely, “for every integer 2”. Similarly, 
‘(an E€ Z)P(ny is read “there exists an » in Z such that P(x)” or “there exists 
an integer x such that P(r)”. Note that the symbol ‘e’ is here read as the 
preposition ‘in’. Fhe above quantifiers are called restricted quantifiers. 

In the same way, we have restricted set formation, both implicit and explicit, 
as in ‘{n: P(w)}? and ‘{n €Z: P(n)}’, both of which are read “the set of all 
integers n such that P(x)”. 

Restricted variables can be defined as abbreviations of unrestricted variables 
by 


(Vz € A)P(z) —& (¥x)(z2 € A = P(z)), 
(aE A)P(z) © Ax(reA & Pfz)), 
{feE A: P(x}} = fe:2E A & Pi}. 


Although there is never any ambiguity in sentences containing explicitly 
restricted variables, it sometimes helps the eye to see the structure of the 
sentence if the restricting phrases are written in superscript position, as in 
(ve>°)(3n=7). Some restriction was implicit on page 1. If the reader agreed that 
(vx)(x? — 1 = ( — 1)(2£ + 1)) was true, he probably took x to be a real 
number. 


0.6 ORDERED PAIRS AND RELATIONS 9 


6. ORDERED PAIRS AND RELATIONS 


Ordered pairs are basic tools, as the reader knows from analytic geometry. 
According to our gencral principle, the ordered pair <a,5> is taken to be a 
certain set, but here again we don’t care which particular set it is so long as it 
guarantees the crucial characterizing property: 


<2,y> = <a,b> |= «= aandy= b, 


Thus <1,3> # <3,1>. 

The notion of a correspondence, or relation, and the special case of a map- 
ping, or function, is fundamental to mathematics. A correspondence is a pairing 
of objects such that given any two objects z and y, the pair <z, y> cither docs 
or does not correspond. A particular correspondence (relation) is generally 
presented by a statement frame P{z, y) having two free variables, with x and y 
corresponding if any only if P(z, y) is true. Given any relation (correspondence), 
the set of all ordered pairs <x, y> of corresponding elements is called its graph. 

Now a relation is a mathematical object, and, as we have said several times, 
it is current practice to regard every mathematical object as a set of some sort 
or other. Since the graph of a relation is a set (of ordered pairs), it is efficient and 
customary to take the graph éo be the relation. Thus a relation (correspondence) 
is simply a set of ordered pairs. If R is a relation, then we say that x has the 
relation R to y, and we write ‘aRy’, if and only if <z,y> ER. We also say 
that 2 corresponds to y under #. The set of all first elements occurring in the 
ordered pairs of a relation R is called the domain of R and is designated dom R 
or D(#). Thus 

dom & = {x: (Jy)<a,y> € R}. 


The set of second elements is called the range of R: 
range R = fy: (Jxz)<2,y> ER. 


The inverse, R—', of a relation FR is the set of ordered pairs obtained by reversing 
those of R: 
Rol = (<a,y> :<y,2> € R}. 


A statement frame P(z, y) having two free variables actually determines a pazr 
of mutually inverse relations R & S, called the graphs of P, as follows: 


R= {<z,y>:P@,y}, S= {<y,rc> : PG, y}. 


A two-variable frame together with a choice of which variable is considered to 
be first might be called a directed frame. Then a directed frame would have a 
uniquely determined relation for its graph. The relation of strict inequality 
on the real number system R would be considered the set {<z,y>: 2 < y}, 
since the variables in‘z < y’ have a natural order. 

The sect AX B= {<2,y>:26A&y CB} of all ordered pairs with 
first element in A and second element in B is called the Cartesian product of the 


10 INTRODUCTION 0.7 


sets A and B. A relation # is always a subset of dom R X range R. If the two 
“factor spaces” are the same, we can use exponential notation: 4? = A Xx A. 

The Cartesian product R? = RXR is the “analytic plane”. Analytic 
geometry rests upon the one-to-one coordinate correspondence between R? and 
the Euelidean plane E? (determined by an axis system in the latter), which 
enables us to treat geometric questions algebraically and algebraic questions 
geometrically. In particular, since a relation between sets of real numbers is a 
subset of R?, we can “picture” it by the corresponding subset of the Euclidean 
plane, or of any model of the Euclidean plane, such as this page. A simple 
Cartesian product is shown in Fig. 0.1 (A U B is the union of the sets A and B). 


a RiA} 
B 
i 
On (S00 een a ee ee eS 
1 4a A 


XB when 1={3, 2]uf24, 3} and B=fs, t]u{2} 
Fig. 0.1 Fig. 0.2 
lf & isa relation and A is any set, then the restriction of R to A, Rf A, 
is the subset of R consisting of those pairs with first element in A: 
R{A= {<2z,y>:<2z,y> €Randse A}. 


Thus R | A = Rn (A X range R), where CN D is the intersection of the sets 
Cand D. 

Uf R is a relation and A is any set, then the image of A under R, RAJ, is 
the set. of second elements of ordered pairs in R whose first elements are in A: 


R[AJ = fy: (Az)}izeEA & <z,y> ER). 
Thus R[A} = range (2 f A), as shown in Fig. 0.2. 


7. FUNCTIONS AND MAPPINGS 


A function is a relation f such that each domain element x is paired with exactly 
one range element y. This property can be expressed as follows: 


<z,y> Efand <2,z> Ef => y=2z. 


0.7 FUNCTIONS AND MAPPINGS i 


The y which is thus uniquely determined by f and x is designated f(z): 
y= fit) @ <ny> Ef. 


One tends to think of a function aa being active and a relation which is not 
4 function as being passive. A function f acts on an element z in its domain to 
give f(z), We take z and apply f to it; indeed we often call a function an operator. 
On the other hand, if # is a relation but not a function, then there is in general 
no particular y related to an element x in its domain, and the pairing of z and y 
is viewed more passively. 

We often define a function f by specifying its value f(x) for each x in its 
domain, and in this connection a stopped arrow notation is used to indicate the 
pairing. Thus z +> 2? is the function assigning to each number x its square x. 


f 
<—2,4> <2, 4> 


Fig. 0.3 


If we want it to be understood that f is this funetion, we can write “Consider 
the function f: x ++ x?”. The domain of f must be understood for this notation 
to be meaningful. 

If f is a function, then f~! is of course a relation, but in general it is not a 
function. For example, if f is the function z +> x*, then f—' contains the pairs 
<4,2> and <4, —2> and so is not a function (see Fig. 0.3). If f7! és a fune- 
tion, we say that fis one-to-one and that f is a one-to-one correspondence between 
its domain and its range. Each x &€ dom /f corresponds to only one y & range f 
(f is a function), and each y & range f corresponds to only one x & dom f (f~! is 
a function). 

The notation 

f: AB 


is read “a (the) function f on A into B” or “the function f from A to B”. The 
notation implies that f is a function, that dom f= A, and that range fC B. 
Many people feel that the very notion of function should include all these 
ingredients; that is, a function should be considered an ordered triple < f, A, B>, 
where f is a function according to our more limited definition, A is the domain 


12 INTRODUCTION 0.8 


of f, and B is a superset of the range of f, which we shall call the codomain of f in 
this context. We shall use the terms ‘map’, ‘mapping’, and ‘transformation’ 
for such a triple, so that the notation f: A — B in its totality presents a mapping. 
Moreover, when there is no question about which set is the codomain, we shall 
often call the function f itself a mapping, since the triple <j, A, B> is then 
determined by f. The two arrow notations can be combined, as in: “Define 
fi: ROR by 2 x?”, 

A mapping f: A — B is said to be injective if f is one-to-one, surjective if 
range f = B, and bijective if it is both injective and surjective. A bijective 
mapping f: A — 8 is thus a one-to-one correspondence between its domain A 
and its codomain B. Of course, a function is always surjective onto its range R, 
and the statement that f is surjective means that R = B, where B is the under- 
stood codomain. 


8 PRODUCT SETS; INDEX NOTATION 


One of the characteristic habits of the modern mathematician is that as soon as 
a new kind of object has been defined and discussed a little, he immediately 
looks at the set of all such objects. With the notion of a function from A to S 
well in hand, we naturally consider the set of all functions from A to S, whieh we 
designate S4. Thus R* is the set of all real-valued functions of one real variable, 
and S?* is the set of all infinite sequences in S. (It is understood that an infinite 
sequence is nothing but a function whose domain is the set Z* of all positive 
integers.) Similarly, if we set = {1,..., 2}, then S* is the set of all finite 
sequences of length » in 5S. 

If B isa subset of S, then its characterisite function (relative to S) is the func- 
tion on S, usually designated X,, which has the constant value I on B and the 
constant value 0 off B. The set of all characteristic functions of subsets of S is 
thus 25 (since 2 = {0,1}). But because this collection of functions is in a 
natural one-to-one correspondence with the collection of all subsets of S, Xz 
corresponding to B, we tend to identify the two collections. Thus 2° is also 
interpreted as the set of all subsets of S. We shail spend most of the remainder 
of this section discussing further similar definitional ambiguities which mathe- 
maticians tolerate. 

The ordered triple <2, y,2> is usually defined to be the ordered pair 
<<z,y>,2>. The reason for this definition is probably that a funetion of 
two variables x and y is ordinarily considered a function of the single ordered 
pair variable <x, y¥>, so that, for example, a real-valued function of two real 
variables is a subset of (R x R) X R. But we also consider such a function a 
subset of Cartesian 3-space R°. Therefore, we define R® as (R x R) x R; 
that is, we define the ordered triple <a, y,2> as <<az,y>,2>. 

On the other hand, the ordered triple <2, y,z> could also be regarded as 
the finite sequence {<1,2x>, <2, y>, <3, 2>}, which, of course, is a different 
object. These two models for an ordered triple serve equally well, and, again, 


0.8 PRODUC? SELS; INDEX NOTATION 13 


mathematicians tend to slur over the distinction. We shall have more to say 
on this point later when we discuss natural isomorphisms (Section 1.6). For 
the moment we shall simply regard R? and R? as being the same; an ordered 
triple is something which can be “viewed” as being either an ordered pair of 
which the first element is an ordered pair or as a sequence of length 3 (or, for that 
matter, as an ordered pair of which the second element is an ordered pair). 

Similarly, we pretend that Cartesian 4-space R* is R*, R? x R?, or 
R'x R?= Rx ((R X R) X R), ete. Clearly, we are in effect assuming an 
associative law for ordered pair formation that we don’t really have. 

This kind of ambiguity, where we tend to identify two objects that really are 
distinct, is a necessary corollary of deciding exactly what things are. It is one 
of the prices we pay for the precision of set theory; in days when mathematics 
was vaguer, there would have been a single fuzzy notion. 

The device of indices, which is used frequently in mathematics, also has am- 
biguous implications which we should examine. An indexed collection, as a set, 
is nothing but the range set of a function, the indexing function, and a particular 
indexed object, say z;, is simply the value of that function at the domain element i. 
If the set of indices is 7, the indexed set is designated fa;:7E 7} or {xi}ier 
(or {x}, in case J = Z*). However, this notation suggests that we view the 
indexed set as being obtained by letting the index run through the index set I 
and collecting the indexed objects. That is, an indexed set is viewed as being 
the set logether with the indexing function. This ambivalence is reflected in the 
fact that the same notation frequently designates the mapping. Thus we refer 
to the sequence {x,}%-;, where, of course, the sequence is the mapping n+> zp. 
We believe that if the reader examines his idea of a sequence he will find this 
ambiguity present. He means neither just the set nor just the mapping, but the 
mapping with emphasis on its range, or the range “together with” the mapping. 
But since set theory cannot reflect these nuances in any simple and graceful way, 
we shall take an indexed set to de the indexing function. Of course, the same 
range object may be repeated with different indices; there is no implication that 
an indexing is one-to-one. Note also that indexing imposes no restriction on the 
set being indexed; any set can at least be self-indexed (by the identity function). 

Except for the ambiguous ‘{z; : 7 € Z}’, there is no universally used notation 
for the indexing function. Since 2; is the valuc of the function at 7, we might 
think of ‘x,’ as another way of writing ‘x{z)’, in which case we designate the 
function ‘x’ or ‘x’. We certainly do this in the case of ordered »-tuplets when 
we say, “Consider the n-tuplet x = <21,...,2,>”. On the other hand, there 
is no compelling reason to use this notation. We can call the indexing function 
anything we want; if it is f, then of course f(z) = 2; for all 2. 

We come now to the general definition of Cartesian product. Earlier we 
argued (in a special case} that the Cartesian product A X B x C is the set of 
all ordered triples x = <x, t2,23> such that x, € 4,2, EB, and 23 €C. 
More generally, 4; X Ag X--- Ang, or [[%; A;, is the set of ordered n- 
tuples x = <2a1,...,2%,> such that 2; € A; for? = 1,...,n. If we interpret 


14 INTRODUCTION 0.9 


an ordered n-tuplet as a function on 7% = {1l,..., 2}, we have 


TI: A; is the set of all functions x with domain % such that 2; e€ A; 
for all 7 EH. 


This rephrasal generalizes almost verbatim to give us the notion of the 
Cartesian product of an arbitrary indexed collection of sets. 


Definition. The Cartesian product [];<e;S; of the indexed collection of 
sets {S;:7€ J} is the set of all functions f with domain the index set J 
such that f(7) € S; for all 7 € J. 


We can also use the notation J]{S;:2¢€ 7} for the product and f;, for the 
value f(z). 


9. COMPOSITION 


If we are given maps f: A — B and g: B > C, then the composition of g with f, 
go f, is the map of A into C defined by 


(g of)(z) = g(f(x)) forall re A. 


This is the function of a function operation of elementary calculus. If f and g are 
the maps from R to R defined by f(z) = 21/4 + 1 and g(x) = x?, then fo g(x) = 
(x?) VF 4 1 = x34 2, and go f(z) = (4? + 1)? = 23 4 O23 41. Note 
that the codomain of f must be the domain of g in order for g  f to be defined. 
This operation is perhaps the basic binary operation of mathematics. 


Lemma. Composition satisfies the associative law: 
So(geoh) = YSog)eok. 


Proof. (fe @eA))@) = s(geoh)(a)) = s(glh))) = Peg) = 
{fo g) 9 h) (x) for allz E€ domaA. O 


If A is a set, the identity map J4: A — A is the mapping taking every 
« €A toitself. Thusi, = {<2,2>: 2 ¢€ A}. If f maps A into B, then clearly 


folasf=TIezocf. 


If g: B — A is such that g o f = 74, then we say that g is a left inverse of f and 
that fis a right inverse of g. 


Lemma. If the mapping f: A — B has both a right inverse A and a left 
inverse g, they must necessarily be equal. 


Proof. This is just algebraic juggling and works for any associative operation. 
We have 
ho fIgoh= (gofPoh=geo(foh) =geolg = yg. 0 


0.10 DUALITY 15 


In this case we call the uniquely determined map g:B — A such that 
fog= Igand geof = i, the inverse of f. We then have: 


Theorem. A mapping f: A — B has an inverse if and only if it is bijective, 
in which ease its inverse is its relational inverse f—‘. 


Proof. If f is bijective, then the relational inverse f—' is a function from B to A, 
and the equations fo f—! = Ig and f—!of = I4 are obvious. On the other 
hand, if fog = Ix, then f is surjective, since then every y in B can be written 
y —f(g(y)). And if gof = I4, then f is injective, for then the equation 
f(x) = f(y) implies that x = g(f(x)) = g(fy)) = y. Thus f is bijective if it 
has an inverse. 0 


Now let G(A) be the set of all bijections f: A — A. Then G(A} is closed 
under the binary operation of composition and 


1) fo(yok) = (fog) chk forall ff ykeS; 
2) there exists a unique 7 € G(A) such that fo f= Iof = fforallfeS; 
3) for each f € S there exists a unique g € S such that fog = gof= TJ. 


Any set. G closed under a binary operation having these properties is called 
a group with respect to that operation. Thus (A) is a group with respect to 
composition. 

Composition can also be defined for relations as follows, If R c A x Band 
ScCBxC, then Seo RCA xX Cis defined by 


<2,2> ESoR & (3y®*)(<2x,y> EC R& <y,2> ES). 


ff R and S are mappings, this definition agrees with our earlier one. 


10. DUALITY 


There is another elementary but important phenomenon called duality which 
occurs in practically all branches of mathematics. Let F: A x BC be any 
function of two variables. It is obvious that if x is held fixed, then F(z, y) is a 
function of the one variable y. That is, for each fixed 2 there is a function 
h?:B — € defined by h*(y) = F(a, y). Then z+ kh’ is a mapping ¢ of A into 
C®, Similarly, each y € B yields a function g, € C*, where g,(z) = F(x, y), 
and y= g, is a mapping @ from B to C4, 

Now suppose conversely that we are given a mapping ¢: A — C®. For cach 
2 € A we designate the corresponding value of ¢ in index notation as h*, so 
that 4? is a function from B to C, and we define F: A X BGC by F(z, y) = 
h@(y). We are now back where we started. Thus the mappings ¢: A > C%, 
Pf: Ax BC, and 6: B > C4 are equivalent, and can be thought of as three 
different ways of viewing the samc phenomenon. The extreme mappings ¢ and 
6 will be said to be dual to each other. 


16 INTRODUCTION 0.10 


The mapping ¢ is the indexed family of functions {h7:2 € A} CC®. Now 
suppose that ¥ C C® is an unindexed collection of functions on B into C, and 
define FP: xX BC by F(f, y) = Sty). Then 6: B — C% is defined by g,(f) = 
f(y). What is happening here is simply that in the expression f(y) we regard both 
symbols as variables, so that f(y) is a function on ¥ & B. Then when we hold y 
fixed, we have a function on ¥ mapping § into C. 

We shall see some important applications of this duality principle as our 
subject develops. For example, an m x matrix is a function t = {¢;;} in 
R*<*_ We picture the matrix as a rectangular array of numbers, where ‘2’ is 
the row index and ‘7’ is the column index, so that ¢;; 1s the number at the inter- 
section of the 7th row and the jth column. If we hold ¢ fixed, we get the n-tuple 
forming the zth row, and the matrix can therefore be interpreted as an m-tuple 
of row n-iuples. Similarly (dually), it can be viewed as an x-tuple of column 
m-tuples. 

In the same vein, an n-tuple <jJ),...,/.> of functions from A to B can 
be regarded as a single x-tuple-valued function from A to B’, 


awe <fila),...,f,{a)>. 


In a somewhat different application, duality will allow us to regard a finite- 
dimensional vector space V as being its own second conjugate space (V*)*. 

It is instructive to look at elementary Euclidean geometry from this point 
of view. Today we regard a straight line as being a set of geometric points. 
An older and more neutral view is to take points and lines as being two different 
kinds of primitive objects. Accordingly, let 4 be the set of all points (so that A 
is the Euclidean plane as we now view it), and let B be the set of all straight lines. 
Let F be the incidence function: F{p, 2) = 1 if p and 7 are incident (p is “on” J, 
Lis “on” p) and F(p, 7) = 0 otherwise. Thus ¥ maps A X B into {0, 1}. Then 
for each i € B the function g;(p) = F{p, 2) is the characteristic function of the 
set of points that we think of as being the line / (g:(p) has the value 1 if pis on 2 
and 0 if pis not oni.) Thus each line determines the set of points that are on it. 
But, dually, each point p determines the set of lines | “on” 2, through 7ts char- 
acteristic function h?(. Thus, in complete duality we can regard a line as being 
a set of points and a point as being a set of lines. This duality aspect of geometry 
is basic in projective geometry. 

It is sometimes awkward to invent new notation for the “partial” function 
obtained by holding a variable fixed in a function of several variables, as we did 
above when we set g(x) = F(x, y), and there is another device that is frequently 
useful in this situation. This is to put a dot in the position of the “varying 
variable”. Thus #(a,-) is the function of one variable obtained from F(a, y) 
by holding « fixed at the value a, so that in our beginning discussion of duality 
we have 

h? = F(z,-), Jy = F(-,y). 


If f is a funetion of one variable, we can then write f = f(-), and so express the 


0.11 THE BOOLEAN OPERATIONS 17 


above equations also as A*(-) = F(x, -), g,{-) = F(-, y}. The flaw in this notation 
is that we can’t indicate substitution without losing meaning. Thus the value 
of the function F(z, -) at b is F(x, b), but from this evaluation we cannot read 
backward and tell what function was evaluated. We are therefore forced to 
some such cumbersome notation as F(z, -}|,, which can get out of hand. Never- 
theless, the dot device is often helpful when it can be used without evaluation 
difficulties. In addition to eliminating the need for temporary notation, as 
mentioned above, it can also be used, in situations where it is strictly speaking 
superfluous, to direct the eye at once to the position of the variable. 

For example, later on D;F will designate the directional derivative of the 
function F in the (fixed) direction §. This is a function whose value at a« is 
D,F (a), and the notation D,F(-) makes this implicitly understood fact explicit. 


ll. THE BOOLEAN OPERATIONS 


Let S be a fixed domain, and let § be a family of subsets of S. The union of F, 
or the unzon of all the sets in §, is the set of all clements belonging to at least one 
set in $, We designate the union Us or Uses A, and thus we have 


Us = {2:GAS)\(@ EA}, yEUFe (A™)(y € A). 


We often consider the family ¥ to be indexed. That is, we assume given a set 
(the set of indices) and a surjective mapping 7+> A, from / to $, so that F¥ = 
{A;:¢EJZ}. Then the union of the indexed collection is designated U;e7 A; or 
U{4;:2e%. The device of indices has both technical and psychological 
advantages, and we shall generally use it. 

If $ is finite, and either it or the index set is listed, then a different notation 
is used for its union. If ¥ = {A, B}, we designate the union A U B, a notation 
that displays the listed names. Note that here we havez €E AUB S2xEA or 
xeEB. If $= {A;:¢=1,..., 2}, we generally write ‘4, U Ag U---U A,’ 
or ‘Ui, A?’ for Us. 

The intersection of the indexed family {A,} ier, designated Mex Ai, is the 
set of all points that lie in every A;. Thus 


ze (yA: = Wi )(x & A,). 


For an unindexed family ¢ we use the notation {VF or [aes A, and if ¢ = 
{A, B}, then NF = ANB. 

The complement, A’, of a subset of S is the set of elements x = § not in 
A: Al = {x55 :x%¢ A}. The law of De Morgan states that the complement of 
an intersection 18 the union of the complements: 


(ai = Yao 


This an immediate consequence of the rule for negating quantifiers. It is the 


18 INTRODUCTION 0.i1 


equivalence between ‘not always in’ and ‘sometimes not in’: [~(¥i)(z € A,) = 
(32)(2 ¢ A,)] says exactly that 


zE (May 2 2eE Ut4p. 


If we set B; = Aj and take complements again, we obtain the dual form: 


(UierB,)’ = Nier(B)). 
Other principles of quantification yield the laws 


Bn(U a) = U@naAd 


ter 
from P & (3x)Q{(x) = (4x) (P & Qiz)), 
Bu(() Ay) = N (UA, 
ref gEL 
Bn (QQ) 4) = 1 @n 49, 
9d wel 
Bu(U 4) = U (Bu Ad. 
ier igi 
In the case of two sets, these laws imply the following familiar laws of set algebra: 
(AUB) = ANB’, (AN BY = A'UB' (De Morgan}, 
An (BUC) ={ANB)U(ANO), 
AVU(BNO= (AUN (A ul). 
Even here, thinking in terms of indices makes the laws more intuitive. Thus 
(4; Az)’ = AU Ad 


is obvious when thought of as the equivalence between ‘not always in’ and 
‘sometimes not in’. 

The family # is disjoiné if distinct sets in ¥ have no elements in common, i.e., 
if (WX, ¥")(X = Y¥3XmY¥Y=@). For an indexed family {A,}.sez the 
condition becomes 7 ¥ j= A;M A; = ©. If F = {A, B}, we simply say that 
A and B are disjoint. 

Given f: U — V and an indexed family {R,} of subsets of V, we have the 
following important identities: 

P(Uaj=Uries [0 e]= Ose, 
7 zt t 4 
and, for a single set B CV, 
PNB) = (FTNBIY. 
For example, 


ref [N Bi] =f) E Nn B; & (vi) (f(x) € B,) 
= (Vi@ Ef IB) ore NB). 


0.12 PARTITIONS AND EQUIVALENCE RELATIONS 19 


The first, but not the other two, of the three identities above remains valid 
when f is replaced by any relation R. It follows from the commutative law, 
(3x)(Jy~A —& (y)(GQxz)A. The seeond identity fails for a general R because 
“(Jz) (vy) and ‘(¥vy)(3ry have different meanings. 


12. PARTITIONS AND EQUIVALENCE RELATIONS 


A partition of a set A is a disjoint family F of sets whose union is A. We call the 
clements of F ‘fibers’, and we say that 5 fibers A or is a fibering of A. For example, 
the set of straight lines parallel to given line in the Euclidean plane is a fibering 
of the plane. If ‘z’ designates the unique fiber containing the point z, then 
x Fis a surjective mapping 7: A — ¢ which we call the projection of A on F. 
Passing from a set A to a fibering of A is one of the principal ways of forming 
new mathematical objects. 

Any function f automatically fibers its domain into sets on which f is con- 
stant. If A is the Euclidean plane and f(p) is the z-coordinate of the pomt p in 
some coordinate system, then f is constant on each vertical line; more exactly, 
f— (x) is a vertical line for every « in R. Moreover, r++ f—!(x) is a bijection 
from R to the set of all fibers (vertical lines). In general, if f: A — B is any sur- 
jective mapping, and if for each value y in B we set 


Ay=f"y) = @ EAs f@=y, 


then ¥ = {A,:y € B} is a fibering of A and ¢g:y+> Ay, is a bijection from 
B to F. Also y of is the projection 7: A > F, since ¢ ° f(z) = ¢(f(z)) is the 
set Z of all zin A such that f(z) = f(z). 

The above process of generating a fibering of A from a function on A is 
relatively trivial. A more important way of obtaining a fibering of A is from 
an equality-like relation on A called an equivalence relation. An equivalence 
relation ~ on A is a binary relation which is reflexive (2 ~ x for every « € A), 
symmetric (a ~ y => y ~ x), and transitive (x ~ y and y~ z=> 4 ~ 2). Every 
fibering ¥ of A generates a relation ~ by the stipulation that z ~ y if and only if 
x and y are in the same fiber, and obviously ~ is an equivalence relation. The 
most important fact to be established in this section is the converse. 


Theorem. Every equivalence relation ~ on A is the equivalence relation 
of a fibering. 


Proof. We obviously have to define Z as the set of elements y equivalent to x, 
= {y: y ~ x}, and our problem is to show that the family ¢ of all subsets of A 
obtained this way is a fibering. 

The reflexive, symmetric, and transitive laws become 


x ef, xEeyorye;, and «zéJandyez = x€zZ. 


Reflexivity thus implies that ¢ covers A. Transitivity says that if y € Z, then 
ELeyY= «x e7; that is, if y ez, then ¥C2Z But also, if y EZ, then z EF by 


20 INTRODUCTION 0.12 


symmetry, and so ZC % Thus y € Z implies 7 = %. Therefore, if two of our 
sets @ and b have a point z in common, then @ = ¥ = 8. In other words, if Z is 
not the set 6, then @ and 5 are disjoint, and we have a fibering. 0 


The fundamental] role this argument plays in mathematics is due to the fact 
that in many important situations equivalence relations occur as the primary 
object, and then are used to define partitions and functions. We give two 
examples. 

Let Z be the integers (positive, negative, and zero), A fraction ‘m/n’ can 
be considered an ordered pair <m, > of integers with xn ~ 0. The set of all 
fractions is thus Z x (Z — {0}). Two fractions <m,n> and <p,g> are 
“equal” if and only if mg = xp, and equality is checked to be an equivalence 
relation. The equivalence class <m, > is the object taken to be the rational 
number m/n, Thus the rational number system Q is the set of fibers in a par- 
tition of Z x (2 — {O}). 

Next, we choose a fixed integer p € Z and define a relation # on Z by 
mEn = p divides m — rn. Then £ is an equivalence relation, and the set Z, of 
its equivalence classes is called the integers modulo p. It is easy to see that min 
if and only if m and n have the same remainder when divided by p, so that in 
this case there is an easily calculated function f, where f(m) is the remainder 
after dividing m by p, which defines the fibering. The set of possible remainders 
is {0,1,...,p — 1}, so that Z, contains p elements. 

A function on a set A can be “factored” through a fibering of A by the 
following theorern. 


Theorem. Let g be a function on A, and Jet ¥ be a fibering of A. Then g 
is constant on each fiber of F if and only if there exists a function g on ¢ 
such that g = Je wm. 


Proof. If g is constant on each fiber of #, then the association of this unique 
value with the fiber defines the function J, and clearly g = 3° w. The converse 
is obvious. 0 


CHAPTER I 


VECTOR SPACES 


The calculus of functions of more than one variable unites the calculus of one 
variable, which the reader presumably knows, with the theory of vector spaces, 
atid the adequacy of its treatment depends directly on the extent to which vector 
space theory really is used. The theories of differential equations and differential 
geometry are similarly based on a mixture of calculus and vector space theory. 
Such “vector calculus” and its applications constitute the subject matter of this 
book, and in order for our treatment to be completely satisfactory, we shall 
have to spend considerable time at the beginning studying vector spaces them- 
selves. This we do principally in the first two chapters. The present chapter is 
devoted to general vector spaces and the next chapter to finite-dimensional 
spaces. 
We begin this chapter by introducing the basic concepts of the subject— 
vector spaces, vector subspaces, linear combinations, and linear transforma- 
tions—and then relate these notions to the lines and planes of geometry. Next 
we estaLlish the most elementary formal properties of linear transformations and 
Jartesian product vector spaces, and take a brief look at quotient vector spaces. 
This brings us to our first major objective, the study of direct sum decomposi- 
tions, which we undertake in the fifth section. The chapter concludes with a 
preliminary examination of bilinearity. 


1, FUNDAMENTAL NOTIONS 


Vector spaces and subspaces. The reader probably has already had some 
contact with the notion of a vector space. Most beginning calculus texts discuss 
ycometric vectors, which are represented by “arrows” drawn from a chosen 
origin O. These vectors are added geometrically by the parallelogram rule: 
The sum of the vector OA (represented by the arrow from 0 to A) and the 
vector OB is the vector OP, where P is the vertex opposite O in the parallelogram 
having OA and OB as two sides (Fig. 1.1). Vectors can also be multiplied by 
numbers: 2(0A) is that vector OB such that B is on the line through O and 
A, the distance from 0 to B is |z] times the distance from O to A, and B and A 
are on the same side of O if x is positive, and on opposite sides if z is negative 
21 


22 VECTOR SPACES 11 


ates Pare A es a oes ee 
(GA+0B) +06 =OP+00=0X OA+(OB+ 00) =O0A+ 00 =OX 


Fig. 1.3 


(Fig. 1.2). These two vector operations satisfy certain laws of algebra, 
which we shall soon state in the definition. The geometric proofs of these laws 
are generally skctchy, consisting more of plausibility arguments than of airtight 
logic. For example, the geometric figure in Fig. 1.3 is the essence of the usual 
proof that vector addition is associative. In each case the final vector OX is 
represented by the diagonal starting from O in the parallelepiped constructed 
from the three edges OA, OB, and OC. The set of all geometric vectors, together 
with these two operations and the laws of algebra that they satisfy, constitutes 
one example of a vector space. We shall return to this situation in Section 2. 

The reader may also have seen coordinate triples treated as vectors. In this 
system a three~limensional vector is an ordered triple of numbers < 2%, %2, t3 > 
which we think of geometrically as the coordinates of a point in space. Addition 
is now algebraically defined, 


X21, Xo, 03> + K<Y1, Yo, Yar = X21 + Yi, %e + Yo, Ta + Y¥3r, 


as is multipleation by numbers, ¢<21,22,%g3> = <tx,, fe,ft3>. The 
vector laws are much easier to prove for these objects, since they are almost 
algebraic formalities. The set R* of all ordered triples of numbers, together with 
these two operations, is a second example of a vector space. 


Ll FUNDAMENTAL NOTIONS 23 


If we think of an ordered triple <x, %, 23> as a function x with domain 
the set of integers from 1 to 3, where x; is the value of the function x at 7 (see 
Section 0.8), then this vector space suggests a general type called a function 
space, which we shall examine after the definition. For the moment we remark 
only that we defined the sum of the triple x and the triple y as that triple z 
such that 2; = x; + y; for every i. 

A vector space, then, is a collection of objects that can be added to cach 
other and multiplied by numbers, subject to certain laws of algebra. In this 
context a number is often called a scalar. 


Definition, Let V be a set, and let there be given a mapping <a, 8> + 
a+3 from VP xX V to VY, called addition, aud a mapping <z,a> + ze 
from R x V to Y, called multiplication by scalars. Then V is a vector space 
with respect to these two operations if: 


Al. at+t(@+Y7)=f+84+Y forall a, 8, YEYV. 

A2, a+fB=Bta forall a,@eV. 

A. There oxists an element 0 € V such that a +0 = ae for all ae F. 
A4. For every a € V there exists ¢ 8 € V such that a4- #8 = 0. 


Sl. (ya = xlya) forall z,yER, «EV. 
$2. (¢+ ya = ra + Ye forall z,yER, ae V, 
$3. x(a +8) = xa + 28 forall «ER, a BEY. 
$4. lea=a forall a EV. 


In contexts where it is clear (as it generally is} which operations are intended, 
we refer simply to the vector space V. 

Certain further properties of a vector space follow directly from the axioms. 
Thus the zero element postulated in A is unique, and for each @ the § of Ad 
is unique, and is called —a. Also 0a = 0, 20 = 0, and (—La = —a. These 
elementary consequences are considered in the exercises. 

Our standard example of a vector space will be the sect V = R* of all real- 
valued functions on a set A under the natural operations of addition of two 
functions and multiplication of a function by a number. This generalizes the 
example R')?:3) = R¥ that we looked at above. Remember that a function f 
in R4 is simply a mathematical object of a certain kind. We arc saying that two 
of these objects can be added together in a natural way to form a third such 
object, and that the set of all such objects then satisfies the above laws for 
addition. Of course, f + g is defined as the function whose value at a is f(a) 4- 
g{a), so that (f+ g)(a) = f(a) + gfa) for all a in A. For example, in R® we 
defined the sum x + y as that triple whose value at ¢is z; + y; forall ¢. Similarly, 
cf is the function defined by (cf)(a}) = e(f(e)) for all a. Laws Al through S4 
follow at once from these definitions and the corresponding laws of algebra for 
the real number system. For example, the equation (s + 4) f = sf+ of means 


24 VECTOR SPACES 1.1 


that ((s + Of)(a) = (sf + #)(@) for alla € A. But 


((s + D/@ = 6 +9U@) = s(f@) + (f(a) 
= (sf)(@) + (Ya) = (af + fa), 


where we have used the definition of scalar multiplication in R+, the distributive 
law in R, the definition of scalar multiplication in R*, and the definition of 
addition in R4, in that order. Thus we have $2, and the other laws follow 
similarly. 

The set A can be anything at all. If 4 = R, then VY = R® is the vector 
space of all real-valued functions of one real variable. If A = R x R, then 
V — R®*® is the space of all real-valued functions of two real variables. If 
A = {1,2} = 2, then V = R? = R? is the Cartesian plane, and if A = 
{l,...,”} = 7, then V = R® is Cartesian n-space. If A contains a single 
point, then R* is a natural bijective image of R itself, and of course R is trivially 
a vector space with respect to its own operations. 

Now let V be any vector space, and suppose that W is a nonempty subset of 
V that is closed under the operations of V. That is, if a and 6 are in W, then so 
is a+ 8, and if a isin W, then so is xa for every scalar x. For example, let ¥ be 
the vector space R'4 of all real-valued functions on the closed interval {a, 6] CR, 
and let W be the set €{[a, b]) of all conéinuous real-valued functions on fa, 5}. 
Then W is a subset of V that is closed under the operations of Y, since f + ¢ 
and ef are continuous whenever f and g are. Or let V be Cartesian 2-space R?, 
and let W be the set of ordered pairs x = <x, 22> such that x, + 22 = 0. 
Clearly, W is closed under the operations of Y. 

Such a subset W is always a vector space in its own right. The universally 
quantified laws Al, A2, and $1 through $4 hold in W because they hold in the 
larger sct Y. And since there is some 8 in W, it follows that 0 = 08 is in W 
because W is closed under multiplication by scalars. For the same reason, if « 
is in W, then so is --a@ = (—l)a. Therefore, A3 and A4 also hold, and we see 
that W is a vector space. We have proved the following lemma. 


Lemma. If W is a nonempty subset of a veetor space V which is closed 
under the operations of V, then W’ is itself a vector space. 


We call W a subspace of ¥. ‘Thus @({a, b]) is a subspace of R™"!, and the 
pairs <2, %2> such that x, + 22 = 0 form a subspace of R*®. Subspaces will 
be with us from now to the end. 

A subspace of a veetor space R* is called a function space. In other words, a 
function space is a collection of real-valued functions on a common domain 
which is closed under addition and multiplication by scalars. 

What we have defined so far ought to be called the notion of a reat veetor 
space or a vector space over R. There is an analogous notion of a complex vector 
space, for which the scalars are the complex numbers. ‘Then laws $1 through $4 
refer to multiplication by complex numbers, and the space C4 of all complex- 


11 FUNDAMENTAL NOTIONS 25 


valued functions on A is the standard example. In fact, if the reader knew what 
is meant by a field F, we could give a single general definition of a vector space 
over F, where scalar multiplication is by the elements of F, and the standard 
example is the space V = F4 of all functions from A to F. Throughout this 
book it will be understood that a vector space is a real vector space unless explic- 
itly stated otherwise. However, much of the analysis holds as well for complex 
vector spaces, and most of the pure algebra is valid for any scalar field F. 


EXERCISES 


1.1 Sketch the geometric figure representing law 83, 
(OA + OB) = 2@A) + 2(OB), 
for geometric vectors. Assume that x > 1. 
1.2 Prove $3 for R* using the explicit displayed form {2x1, x2, x3) for ordered triples. 
1.3 The vector 0 postulated in A3 is unique, as elementary algebraic fiddling will 
show. For suppose that. 0’ also satisfies A3. Then 
0 =0'+0 ~~ (A3 for 0) 
0+ 0’ (A2) 
= @ (A3 for 0’). 
Show by similar algebraic juggling that, given a, the § postulated in A4 is unique, 
This unique @ is designated —a. 
1.4 Prove similarly that 0a = 0, 20 = 0, and (—l)a = —a. 
15 Prove that if a = 0, then either z = 0 ora = 0. 
1.6 Prove 81 for a function space R4. Prove $3. 


1.7 Given that a is any vector in a vector space V, show that the set fra: 2 € R} 
of all scalar multiples of a is a subspace of V. 

1.8 Given that a and 8 are any tivo vectors in V, show that the set of all vectors 
ra --+ y8, where z and y are any real numbers, is a subspace of V. 

1.9 Show that the set of triples x in R? such that 41 — 22+ 223 = 0 is a subspace 
M. It N is the similar subspace {x: x1 + 22: 23 = 0}, find a nonzero vector a in 
JEAN. Show that Jf 9 N is the set {ra:x & Rj of all scalar multiples of a, 

1,10 Let A be the open interval (0, 1), and let V be R4, Given a point z in (0,1), 
let V, be the set of functions in V that have a derivative at zc. Show that V, is a sub- 
space of V. 

L.11 For any subsets 4 and B of 4 vector space V we define the set sum A -|- B by 
A+B = {a+ 8:e€ Aand se BY. Show that(i+ 8) +¢C = 44+ (8+0C). 
112 Ii ACV and X CR, we similarly define XA = fva:xEX and we A}. 
Show that a nonvoid set A is a subspace if and only if 4-|- A = A and RA = A. 


1.13 Let V be R®, and let Af be the line through the origin with slope &. Let x be 
any nonzero yector in Af. Show that M is the subspace Rx = {ix:4E RB}. 


26 VECTOR SPACES 1,1 


1.14 Show that any other line £ with the same slope & is of the form Mf + a for some a. 
1.15 Let Af be a subspace of a veetor space VY, and let e and 8 be any two vectors in F. 
Given A =a@+ Jf and B =6-+ M, show that either 1 = Bor ANB=2. 
Show also that 1-- B = (a+ 8) + ¥. 

1.16 State more carefully and prove what is meant by “a subspace of a subspace is 
a subspace”, 

1.17 Prove that the intersection of two subspaces of a veclor space is always ilself 
a subspace. 

1.18 Prove more generally that the intersection If = (ser W; of any family 
{W;:2€ DP} of subspaces of V is a subspace of V. 

1.19 Let ¥ again be R©@-™, and let W be the set. of all functions f m V such that f’(x) 
exists for every ¢ in (0, 1). Show that Wis the intersection of the collection of subspaces 
of the form V, that were considered in Exercise 1.10. 

1.26 Let V bea function space R+, and fora point ain .1 let W, be the set of functions 
such that f(a) = 0. W, is clearly a subspace. For a subset BC -\ let Wy be the set 
of functions fin V such that f = 0 on B. Show that Iz is the intersection (jace Wa. 
1.21 Supposing again that X and ¥ are subspaces of V, show thatif VY -- ¥ = V and 
A) ¥ = {0}, then for every vector ¢ in V there is a unique pair of veetors §E X 
and 7 € FY such that € = £+ 9. 

1.22 Show that if Y and ¥ are subspaces of a vector space V, then the union YU ¥ 
can only be a subspace if either YC Y or ¥ CN. 


Linear combinations and linear span. Beeause of the commutative and associ- 
ative laws for vector addition, the sum of a finite set of vectors is the same for all 
possible ways of adding them. For example, the sum of the three vectors 
Gq, O, & Can be calculated in 12 ways, all of which give the same result: 


(Ga + Op) hae = Oe + (a + a) = (ae + Oa) + ay = ay + (ee + aa), ete. 


Therefore, if J = {a,b,c} is the set of indices used, the notation Ye; a;, 
which indicates the sum without telling us how we got it, is unambiguous. In 
general, for any finite indexed set of vectors {a;:7€ I} there is a uniquely 
determined sum veetor 55 ;¢7 a; which we can compute by ordering and group- 
ing the a;’s in any way. 

The index set J is often a block of integers # = {1,...,7}. In this case 
the vectors a; form an n-tuple {a;}7, and unless directed to do otherwise we 
would add them in their natural order and write the sum as )(7., @;. Note 
that the way they are grouped is still left arbitrary. 

Frequently, however, we have to use indexed sets thal are not ordered. 
For example, the general polynomial of degree at most 5 in the two variables 


‘sy’ and “¢ is 
yeast’, 
Of i4+785 


and the finite sct of monomials {st} ,,,<3 has no natural order. 


1.1 FUNDAMENTAL NOTIONS 27 


* The formal proof that the sum of a finite collection of vectors is indepen- 
dent of how we add them is by induction. We give it only for the interested 
reader. 

In order to avoid looking sillv, we begin the induction with two vectors, 
in which case the commutative law a. + a5 = a + ag displays the identity of 
all possible sums. Suppose then that the assertion is true for index sets having 
fewer than 7 elements, and consider a collection {a;:7 © 7} having » members. 
Let 8 and Y be the sum of these vectors computed in two ways. In the com- 
putation of @ there was a last addition performed, so that 6 = (Xiies, ai) + 
(Sies, ai), where {J,, J} partitions 7 and where we can write these two 
partial sums without showing how they were formed, since by our inductive 
hypothesis all possible ways of adding them give the same result. 

Similarly, ¥ = (Liex, a) + (Xiex, a). Now set 


Ly, = J; Ky and tj, = > ai, 
teh 
where it is understood that £;, = 0 if £;, is empty (see Exercise 1.37). Then 
Sires, = &11 + £12 by the inductive hypothesis, and similarly for the other 
three sums. Thus 


B= (£11 + E12) + (f21 + Eo2) = (E11 + £01) + (E12 + E22) = Y, 


which completes our proof. « 


A vector f is called a Hinear combinaiton of a subset A of the vector space V 
if 8 is a finite sum ¥> x;e;, where the vectors a; are all in A and the scalars z; 
are arbitrary. Thus, if A is the subset {f"}¢ C R® of all “monomials”, then a 
function f is a linear combination of the functions in A if and only if f is a 
polynomial function f(t) = 1°27 ct. If A is finite, it is often useful to take the 
indexed set {a;} to be the whole of A, and to simply use a 0-coefficient for any 
vector missing from the sum. Thus, if A is the subset {sin é, cos ¢, e} of R®, 
then we can consider A an ordered triple in the listed ordering, and the function 
3sin¢ — e! = 3-sinté+0-cos!+(—1)e' is the linear combination of the 
triple A having the coefficient triple <3, 0, —1>. 

Consider now the set Z of all linear combinations of the two vectors 
<1,1,1> and <0, 1, —1> in R*. It is the set of all vectors s<1,1,1> + 
i<0,1, —1> = <s,s+4,5 — {>, where s and / are any real numbers. Thus 
L= {<s,s+t,s —t> :<s,t> ER}. It will be clear on inspection that 
Lis closed under addition and scalar multiplication, and therefore is a subspace 
of R®. Also, L contains each of the two given vectors, with coefficient pairs 
<1,0> and <0,1>, respectively. Finally, any subspace M of R° which 
contains each of the two given vectors will also contain all of their linear combi- 
nations, and so will include L. That is, L is the smallest subspace of R® containing 
<1,1,1> and <0, 1, —1>. Itis called the znear span of the two vectors, or the 
subspace generated by the two vectors. In general, we have the following theorem. 


28 VECTOR SPACES 1.1 


Theorem 2.1. If A is a nonempty subset of a vector space V, then the set 
L(A) of all linear combinations of the vectors in A is a subspace, and it is 
the smallest subspace of V which includes the set A. 


Proof. Suppose first that A is finite. We can assume that we have indexed A 
in some way, so that A = fa;:ie€ J} for some finite index set Z, and every 
element of L(A) is of the form je; zja;. Then we have 


(D xe:) + (Dyes) = X (se + ya: 


because the left-hand side becomes }>; (xia; + ya;) when it is regrouped by 
pairs, and then $2 gives the right-hand side. We also have 


c(> tai:) — Dena; 


by 83 and mathematical induction. Thus £(A) is closed under addition and 
multiplication by scalars and hence is a subspace. Moreover, L(A) contains 
each a; (why?) and so includes A. Finally, if a subspace W includes A, then it 
contains each linear combination }* x;a;, so it includes L(A). Therefore, £(A} 
can be directly characterized as the uniquely determined smallest subspace 
which ineludes the set A. 

If A is infinite, we obviously can’t use a single finite listing. However, the 
sum (>of ve,) + (7 y58;) of two linear combinations of clements of A is 
clearly a finite sum of scalars times clements of A. If we wish, we can rewrite it 
as Sf?” z,a,, where we have set 6; = a4; and yj = 2n4; for fj = 1,..., 
In any case, L(A) is again closed under addition and multiplication by scalars 
and so is a subspace. 0 


We call L(A) the iénear span of A. If L(A) = V, we say that A spans V; 
¥ is finite-dimensional if it has a finite spanning set. 

If V = R?, and if 8!, 67, and 8° are the “unit points on the axes”, 6! = 
<1,0,0>, 6% = <0,1,0>, and 6? = <0,0,1>, then {6°} spans V, since 
x = <21,2%2,23> = <2,,0,0> + <0,2%2,0> + <0,0,43> = 2,8! + 
aod? + 2359 = 3°? 2,8' for every x in R*. More generally, if 7 = R” and 6 is 
the x-tuple having 1 in the 7th place and 0 elsewhere, then we have similarly that. 
X= <%,...,2,> = UE, 2:6, so that {8} spans R*. Thus R” is finite- 
dimensional. In general, a funetion space on an infinite set A will not be finite- 
dimensional. For example, it is true but not obvious that @((a, b]) has no finite 
spanning set. 


EXERCISES 


123 Givena = <1,1,1>,8 = <0,1,-—-1>,¥ = <2,0,1>, compute the linear 
combinations aw-! 8-;- ¥, 3a — 28-+ Y,aa-- y8-F zy. Find x,y, and z such that 
za-+y8+2z¥ = <0,0,1> = 6%. Do the same for 5! and 82. 

1.24 Given a = <1,1,1>, 8 = <0,1,—-1L>, ¥ = <1,0,2>, show that each of 
a, 8, is a linear combination of the other two. Show that it is impossible to find 
cocfiicients z, ¥, and z such that ra+ y8-+ 2z¥ = 4!. 


LL FUNDAMENTAL NOTIONS 29 


1.25 a) Find the linear combination of the set A = <¢#, —1,-+1> with coefii- 
cient triple <2, —1,1>. Do the same for <0,1,1>. 
b) Find the coefficient triple for which the linear combination of the triple -1 
is (£+ 1)7. Do the same for 1. 
¢) Show in fact that any polynomial of degree < 2 is a linear combination of -1. 
1.26 Find the lincar combination f of {e, e~*! C R® such that f(0) = land f’(0) = 2. 
1.27 Find a linear combination f of sin x, cos x, and e* such that f(0) = 0, f’(0) = 1 
and f”(0) = 1. 
1.28 Suppose that asin z+ 6cosx-+ ce? is the zero function. Prove that a = b = 
e = 0. 
1.29 Prove that <1,1> and <1,2> span R?. 
1.30 Show that the subspace Af = {x: 21 +22 = 0} C R? is spanned by one vector. 
1.31 Let M be the subspace {x:21 — z2-+ 2r3 = 0} in R®. Find two vectors a 
and b in Af neither of which is a scalar multiple of the other. Then show that A is 
the linear span of a and b. 


1.32 Find the intersection of the linear span of <1,1,1> and <0, 1, —1> in R? 
with the coordinate subspace z2 = 0. Exhibit this intersection as a linear span. 


1.33 Do the above exercise with the coordinate space replaced by 
M = {x:z1 + 22 = 0}. 


1.84 By Theorem 1.1 the linear span £(.4) of an arbitrary subset .4 of a vector space 
¥ has the following two properties: 

i) L(A) is a subspace of VY which ineludes 4; 

ii) If Jf is any subspace which includes .4, then L(A) C M. 
Using only (i) and (ii), show that 

a) AC B= L(A) C LB); 

b) L(E(A)) = L(A). 
1.35 Show that 

a) if Af and N are subspaces of V, then so is Jf + N; 

b) for any subsets 4, BC V, L(A UB) = L(A) + (5). 
1.36 Remembering (Exercise 1.18) that the intersection of any family of subspaces 
is a subspace, show that the linear span L(A) of a subset A of a vector space V is the 
intersection of all the subspaces of V that include A. This alternative characterization 
is sometimes taken as the definition of linear span. 
1.37 By convention, the sum of an empty set of vectors is taken to be the zero vector. 
This is necessary if Theorem 1.1 is to be strictly correct. Why? What about the 
preceding problem? 


Linear transformations. The general function space R4 and the subspace 
@((a, b]) of R'*"! both have the property that in addition to being closed under 
the vector operations, they are also closed under the operation of multiplication 
of two functions. That is, the pointwise product of two functions is again a 
funetion [(fg)}(a) = f(a)g(a)], and the product of two continuous functions is 
continuous, With respect to these three operations, addition, multiplication, 


30 VECTOR SPACES 11 


and scalar multiplication, R4 and @([a, b]) are examples of algebras. If the reader 
noticed this extra operation, he may have wondered why, at least in the context 
of function spaces, we bother with the notion of vector space. Why not study 
all three operations? The answer is that the vector operations are exactly the 
operations that are “preserved” by many of the most important mappings of 
sets of funetions. For example, define 7: @({a, 8]} > R by Tif) = [2 fd) dt. 
Then the laws of the integral calculus say that T(f j-g) = T(f) |- T(g) and 
Tief) = cT(f). Thus T “preserves” the veetor operations. Or we can say that T 
“commutes” with the veetor operations, since plus followed by 7 equals T 
followed by plus. payee T does not preserve multiplication: it is neé true in 
general that T(fy) = TUNE (y). 

Another example is the mapping 7T:x++ y from R® to R? defined by 
Wy = 2x, — zo -} By, Yo = 2,4 Bxy — 5r3, for which we can again verify 
that T(x +- y} = T(x} +- T(y) and Tex) = cT (x). The theory of the solvability 
of systems of linear equations is essentially the theory of such mappings 7; thus 
we have another important type of mapping that preserves the yeetor operations 
(but not products). 

These remarks suggest that we study vector spaces in part so that we can 
study mappings which preserve the vector operations. Such mappings are 
called lincar transformations. 


Definition. If V and W are vector spaces, then a mapping 7: ¥V - Wis a 
linear transformation or a linear map if T(a |- 8) = Tifa) 4- T(S) for all 
a, 8 EV, and Tia) = xT(a) foralla © ¥,2 ER. 


These two conditions on T can be combined into the single equation 
Tae -+ yB) = xT (a) -}| yT{p) forall a, 8EV andall z,y ER. 


Moreover, this equation can be cxtended to any finite sum by induction, so 
that if J is lmear, then 


T(d yas) = » a Plas) 


for any linear combination Y x,a;. For example, {? (XC? ef) — The: Sf? fi. 


EXERCISES 


1.38 Show that the most general linear map from R to R is multiplication by a con- 
stant. 

1.39 For a fixed win V the mapping r+ za from R to V is linear. Why? 

1.40 Why is this truc for a> za when = is fixed? 


1.41 Show that every lincar mapping from R to ¥ is of the form z+ za for a fixed 
veclor a in F. 


1.1 FUNDAMENTAL NOTIONS 31 


1.42 Show that every linear mapping from R? to V is of the form <x), 22> 
t30 + roare for a fixed pair of vectorsa, and agin ¥. What is the range of this mapping? 
1.43 Show that the map f+ J? f(t) dt from @((a, b}) to R docs not preserve products. 
144 Let g be any fixed function in R“*. Prove that the mapping 7:R4 — R4 
defined by T(f) = gf is linear. 

145 Let » be any mapping from a set A to aset B. Show that composition by ¢ is 
a linear mapping from R? to Rt. That is, show that T;:R? — R4 defined by Tif) = 
fo¢ is linear. 


In order to acquire a supply of examples, we shall find all linear transforma- 
tions having R® as domain space. It may be well to start by looking at one such 
transformation. Suppose we choose some fixed triple of functions {f,}? in the 
space R® of all real-valued functions on R, say f) (1) = sin ¢, f2(f) = cos é, and 
f3(t) = e& = exp(é). Then for cach triple of numbers x = {2,}3 in R® we have 
the linear combination 7, «if; with {x;} as cocflicients. This is the element of 
R® whose value at tis D? 2,f,(f} = rz; siné + x2 cost + ge’. Different coefficient 
triples give different functions, and the mapping x> ?., 2if; = x, sin + 
Z_ cos + a3 exp is thus a mapping from R? to R®. It is clearly linear. If we call 
this mapping T, then we can recover the determining triple of functions from 7 
as the images of the “unit points” 8 in R°; 73’) = Yo sf; = f;, and so 
T($1) = sin, T(37) = cos, and 74°) = exp. We are going to sce that every 
lincar mapping from R® to R* is of this form, 

In the following theorem {8'}* is the spanning set for R* that we defined 
earlier, so that x = }°7 2,6' for every n-tuple x = <2,,...,2¢,> in R”. 


Theorem 1.2. If {8;}1 is any fixed n-tuple of vectors in a vector space W, 
then the “linear combination mapping” x >i <;8; is a linear trans- 
formation 7 from R® to W, and 7(8’) = 6; forj = 1,..., 2”. Conversely, 
if 7 is any linear mapping from R* to W, and if we set 6; = T(8’) for j = 
1,...,#, then 7 is the linear combination mapping x > Y7{ 2,8:. 


Proof. The linearity of the linear combination map T' follows by exactly the 
same argument that we used in Theorem 1.1 to show that L(A) is a subspace. 
Thus 


Tx+y)= 2. (x; + yD Bi = 2. (238: + yiBs) 


= > ri8it re ¥i8i = T(x) + TCy), 


and 
T(sx) = x (sz )B; = > 8(z,8;) = > aEr = sT(x). 


Also T(8’) = StL, 838; = B;, since & = land 8! = Ofori ¥ j. 


B2 VECTOR SPACES 1.1 


Conversely, if 7: R* — W is linear, and if we set 8; = T(#) for all 7, then for 
any x = <21,...,%,> in R”® we have T(x) = TOUT 2; 6) = Yi2,T(s)} = 
Yi 2:8; Thus T is the mapping x > }7 2,8; 0 


This is a tremendously important theorem, simple though it, may seem, and 
the reader is urged to fix it in his mind. To this end we shall invent some termi- 
nology that we shall stay with for the first three chapters. If a = {a1,.-.., a} 
is an n-tuple of vectors in a veetor space W, let Za be the corresponding linear 
combination mapping x > >o7 xia; from R* to W. Note that the n-tuple & 
itself isan element of W”. If T is any linear mapping from R” to W, we shall call 
the #-tuple {7(6')\? the skeleton of T. In these terms the theorem can be restated 
as follows. 


Theorem 1.2’, For cach n-tuple @ in W”, the map La: R” — W is linear 
and its skeleton is a. Conversely, if T is any linear map from R” to W, then 
T = Lp where @ is the skeleton of Tf. 


Or again: 


Theorem 1.2’. The map a +> La is a bijection from W™ to the set of alt 
linear maps 7 from R” to W, and 7 + skeleton (T) is its inverse. 


A linear transformation from a vector space V to the scalar field R is called 
a linear functional on V. Thus fre bh {(Q d is a linear functional on V = 
@{[a, b]). The above theorem is particularly simple for a linear functional F: 
since W = R, each veetor 8; = F(8') in the skeleton of F is simply a number b;, 
and the skeleton {);)7 is thus an element of R”. In this case we would write 
F(x) = Yi bay, putting the numerical coefficient ‘,’ before the variable 
‘x’. Thus F(x) = 32, — to + 4a is the linear functional on R* with skeleton 
<3, —1,4>. The set of all linear functionals on R” is in a natural one-to-one 
correspondence with R” itself; we get b from F by b; = F(8*) for all 7, and we 
get F from b by F(x) = > b,x; for all x in R*. 

We next consider the case where the codomain space of T is a Cartesian 
space R”, and in order to keep the two spaces clear in our minds, we shall, for 
the moment, take the domain space to be R*. Each vector 8; = 76‘) in the 
skeleton of T is now an m-tuple of numbers. If we picture this m-tuple as a 
column of numbers, then the three m-tuples 8; can be pictured as a rectangular 
array of numbers, consisting of three columns each of # numbers. Let é;; be the 
ith number in the jth column. Then the doubly indexed set of numbers {é;;} is 
called the maérix of the transformation 7. We call it an m-by-3 (an m x 3) 
matrix because the pictured rectangular array has m rows and three columns. 
The matrix determines 7 uniquely, since its columns form the skeleton of 7. 
The identity T(x) = ¥}2;T(#) = 3? x;8; allows the m-tuple T(x) to be 
calculated explicitly from x and the matrix {¢;;}. Picture multiplying the 
column m-tuple 6; by the scalar x; and then adding across the three columns al 


1.1 FUNDAMENTAL NOTIONS 33 


the 7th row, as below: 


+ 


- ft : : 
YE = a(t + xf tie | + x3 | f2 


y By Be 83 


Since ¢;; is the zth number in the m-tuple 8;, the 7th number in the m-tuple 
Th, 258; is CFL zjts;. That is, if we let y be the m-tuple T(x), then 


3 
v= >, ty for = 1,...,m, 
jal 

and this set of m scalar equations is cquivalent to the one-veetor equation 
y = TQ). 

We can now replace three by » in the above discussion without changing 
anything except the diagram, and thus obtain the following specialization of 
Theorem 1.2. 


Theorem 1.3. Every linear mapping 7 from R* to R” determines the 
m Xn matrix t = (t;} having the skeleton of T as its columns, and the 
expression of the equation y = T(x) in linear combination form is equivalent 
to the m scalar equations 


2 
gi >) tee; for @=1,...,m™. 
j=l 


Conversely, each m  n matrix t determines the linear combination mapping 
having the columns of t as its skeleton, and the mapping t > 7’ is therefore 
a bijection from the set of all m 7 matrices to the set of all linear maps 
from R® to R™. 


A linear functional F’ on R® is a linear mapping from R® to R', so it must 
be expressed by al X n matrix. That is, the n-tuple b in R” which is the skeleton 
of F is viewed as a matrix of one row and » columns. 

As a final example of linear maps, we look at an important class of special 
linear functionals defined on any function space, the so-called coardinaie func- 
tionals. 1f V = R! and 7 /, then the ith coordinate functional 7; is simply 
evaluation aii, so that w,(f) = f(z}. These functionals are obviously linear. In 
fact, the vector operations on functions were defined lo make them linear; sinec 
sf + ig is defined to be that function whose value at 7 is sf(t) + fy(2) for all 7, 
we see that sf + tg is by definition that function such that 7,(sf + ig) = 
smi(f) + tx :(g) for all i! 

If V is R*, then 7; is the mapping x = <2),...,2,> +> 2;. In this case 
we know from the theorem that 7; must be of the form m;(x) = Do? bv; for 
some 7-tuple b. What is b? 


34 VECTOR SPACES 11 


The general form of the linearity property, T(} t:a;) =  2;T(a;), shows 
that 7 and T—' both carry subspaces into subspaces. 


Theorem 1.4. If 7: ¥ -—+ W is linear, then the 7-image of the linear span 
of any subset A C V is the linear span of the 7-image of A: T[L(A)}] = 
L(T{A}). In particular, if A isa subspace, then so is T[A]. I'urthermore, if Y 
is a subspace of W, then 7 '[})] is a subspace of V. 


Proof. According to the formula T(X¥ a;e;) = 2:T(a;), a vector in W is 
the T-image of a linear combination on A if and only if it isa linear combination 
on [A]. That is, T[E(A)] = L(7T[A]). If A isa subspace, then A = L(A) and 
T[A] = L(T([A)), a subspace of W. Finally, if Y is a subspace of W and {a;} C 
TTY], then TCO xa) = Cala) € L(Y) = Y. Thus Exe; € T7?[}] 
and T~?[¥) is its own linear span. 0 


The subspace 7—~'(0) = fae V: Tle) = 0} is called the null space, or 
kernel, of T, and is designated N(T) or 9(7)}. The range of T is the subspace 
T(V)} of W. It is designated A(T) or A(T). 


Lemma 1.1. A linear mapping 7 is injective if and only if its null space 
is {Q}. 
Proof. Tf T is injective, and ifa # 0, then T(a@) + T(0) = Oand the null space 
accordingly contains only 0. On the other hand, if N(7) = {0}, then whenever 
a ~ 8, we havea — 8 ¥ 0, Tla) — Tis) = Tla — 8) ~ 0, and T(a) ~ T(B); 
this shows that T is injective. 0 


A linear map T:V — W which is bijective is called an isomorphism. 
Two vector spaces V and W are isomorphic if and only if there exists an iso- 
morphism between them. 

For example, the map <¢),...,¢,> — 137! ¢:41' is an isomorphism of 
R” with the vector space of all polynomials of degree < x. 

Isomorphie spaces “have the same form”, and are identical as abstract 
vector spaces. That is, they cannot be distinguished from each other solely on 
the basis of vector properties which they do or do not have. 

When a linear transformation is from V to itself, special things can happen. 
One possibility is that T can map a vector @ essentially to itself, T{a) = xa 
for some ¢ in R. In this ease a is called an ezgenvector (proper vector, character- 
istic vector), and x is the corresponding eigenvalue, 


EXERCISES 


1.46 In the situation of Exercise 1.45, show that 7 is an isomorphism if ¢ is bijective 
by showing that 

a) ¢ injective => T surjective, 

b) ¢ surjeclive => T injective. 


1.1 FUNDAMENTAL NOTIONS 35 


1.47 Find the linear functional 2 on R? such that (<1, 1>)} = Oandi(<1,2>) = 1. 
That is, find b = <8, b2> in R? such that ? is the linear combination map 


x — byxy + bere. 


1.48 Do the same for i(<2,1>) = —3and (<1,2>) = 4. 

1.49 Find the linear 7: R? > R® such that T(<i,1>) = # and T(<1,2>) = 8. 
That is, find the functions f) (2) and fe(é) such that T is the linear combination map 
x — wifi + refs. 

1.50 Let 7 be the linear map from R* to R? such that T(6)) = <2, —1,1>, T(8?) = 
<1,0,3>. Write down the matrix of T in standard rectangular form. Determine 
whether or not 6! is in the range of T. 


1.51 Let T be the linear map from R? to R® whose matrix is 


1 2 3 

2 QO —1f. 

3 -l 1 
Find T(x) when x = <1,1,0>;do the same forx = <3, —2,1>. 
1.52 let Jf be the linear span of <1, —1,0> and <0,1,1>. Find the subspace 
7 (4f] by finding two vectors spanning it, where 7 is as in the above exercise. 
1.53 Let T be the map <2, y> > <x-+ 2y,y> from R? to itself. Show that Tis 4 
linear combination mapping, and write down its matrix in standard form. 


1.54 Do the same for 7: <x, y,2> — <2 — 2,2 +2, y> from R? to itself. 

1,55 Find a linear transformation T from R? to itself whose range space is the span 
of <1, —1,0> and <—1,0,2>. 

1.56 Find two linear functionals on R4 the intersection of whose null spaces is the 
Hnear span of <1, 1,1, 1> and <1,0,—-1,0>. You now have in hand a Jinear 
transformation whose null space is the above span. What is it? 


157 Let V = @i{a, b]) be the space of continuous real-valued funetions on [a, 5}, 
also designated ©° [a, 6]), and let IV = @1({a, 3]) be those having continuous first 
derivatives. Let D;:IV¥ > V be differentiation (Df = f’}, and define 7 on V by 
TU) = F, where F(z) = Jz f(t) dt. By stating appropriate theorems of the calculus, 
show that D and T are linear, T maps into IV’, and D is a left inverse of T (De T is 
the identity on ¥). 
1.58 In the above exercise, identify the range of 7 and the null space of D. We 
know that PD is surjective and that T is injective. Why? 
1.59 Let V be the linear span of the functions sin x and cosx. Then the operation 
of differentiation D is a linear transformation from V to V. Prove that D is an isomor- 
phism from ¥ to V. Show that D2? = —I on V, 
1.60 a) As the reader would guess, ©4(R} is the set of real-valued functions on R 
having continuous derivatives up to and including the third. Show that f > f’” is a 
surjective linear map T from @3(R) to C(R). 

b) For any fixed a in R show that f — </f(a), f’(a), f’(a)> is an isomorphism 
from the null space N(T) to R°. (Hint: Apply Taylor’s formula with remainder.} 


36 VECTOK SPACES 1.2 


1.61 An integral analogue of the matrix equations y; = >>; tj2j;,¢ = 1,...,m, is 
the equation 


1 
gis) = f° Ks, Of@ a, s€[0, 1). 


Assuming that A(s, £) is defined on the square [0, 1] < [0,1] and is continuous as a 
function of ¢ for each s, check that f > g is 4 linear mapping from (0, 1) to R®1), 


1.62 For a finite set A = {a;}, Theorem 1.1 is a corollary of Theorem 1.4. Why? 
1.63 Show that the inverse of an isomorphism is linear (and henee is an isomorphism). 
1.64 Find the eigenvectors and eigenvalues of T:R? — R® if the matrix of T is 


Be 


Since every sealar multiple va of an eigenvector a is clearly also an eigenvector, it will 
suffice to find one vector in each “eigendirection”, ‘This is a problem in elementary 
algebra. 


1.65 Find the eigenvectors and eigenvalues of the transformations 7 whose matrices 


are 
1 1 -1 -1 1 —] 2 ] 
—2 9’ —1 —1{’ —2 2)’ —4 -2] 

1.66 The five transformations in the above two exercises exhibit four different kinds 
of behavior according to the number of distinct cigendirections they have. What are 
the possibilities? 

1.67 Let V be the veetor space of palynomials of degree < 3 and define 7: ¥ 4 V 
by f — ott). Find the eigenvectors and eigenvalues of T. 


2. VECTOR SPACES AND GEOMETRY 


The familiar coordinate systems of analytic geometry allow us to consider 
geometric entitics such as lines and planes in vector scttings, and these geometric 
notions give us valuable intuitions about veetor spaces. Before looking at the 
vector forms of these geometric ideas, we shall briefly review the construction of 
the coordinate correspondence for three-dimensional Euclidean space. As usual, 
the confident reader can skip it. 

We start with the line. A coordinate correspondence between a line Z and 
the real number system R is determined by choosing arbitrarily on EL a zero 
point O and a unit point @ distinct from O. Then to each point X on L is assigned 
the number x such that |2| is the distanee from O to X, measured in terms of 
the segment O@ as unit, and x is positive or negative according as X and Q are 
on the same side of O or on opposite sides. The mapping X +> x is the coordinate 
correspondence. Now consider three-dimensional Euclidean space E*. We want 
to set up a coordinate correspondence between E*® and the Cartesian vector 
space R°. We first choose arbitrarily a zero point O and three unit points 
Q1, Qe, and Qs in such a way that the four points do not lie in a plane. Each of 


1.2 VECTOR SPACES AND GEOMETRY 37 


the unit points Q; determines a line L; through O and a coordinate correspon- 
dence on this line, as defined above. The three lines £,, Le, and Lg are called 
the coordinate axcs. Consider now any point X in E*. The plane through X 
parallel to D2 and £3 intersects £, at a point X,, and therefore determines a 
number x), the coordinate of XY, on £,. Ina similar way, X determines points 
Xo on Ly and Xz on Ly which have co- 
ordinates 29 and zg, respectively. Alto- 
gether X determines a triple 


X= <2), 2%, 23> 


in R*, and we have thus defined a mapping 
@:X++x from E* to R* (see Fig. 1.4). 
We call @ the coordinate correspondence 
defined by the axis system. The conven- 
tion implicit in our notation above is that 
6(¥) is y, @(A) is a, ete. Note that the 
unit point Q@, on £L, has the coordinate 
triple 64 — <1, 0,0>, and similarly, that 


(Qs) = 6 = <0,1,0> 
and 


6(Q3) = 8 = <0,0,1>. Fig. 1.4 


There are certain basic facts about the coordinate correspondenee that have 
to be proved as theorems of geometry before the correspondence can be used to 
treat. geometric questions algebraically. These geometric theorems are quite 
tricky, and are almost impossible to discuss adequately on the basis of the usual 
secondary schoo] treatment of geometry. We shall therefore simply assume 
them. They are: 

1) @isa bijection from E* to R*. 

2) Two line segments AB and X¥ are equal in length and parallel, and the 
direction from .1 to B is the same as that from X to Y if and only ifb ~ a = 
y — x (in the vector space R°). This relationship between line segments is 
important enough to formalize. A directed line segment is a geometric line seg- 
ment, together with a choice of one of the two directions along it. If we interpret 
AB as the directed line segment from A to B, and if we define the directed line 
segments AB and XY to be eguzvalené (and write AB ~ XY) if they are equal 
in length, parallei, and similarly directed, then (2} can be restated: 


AB~ XY =~ b—a=y—x. 


3) If X = O, then ¥ is on the line through O and X in E° if and only if 
y = ix forsomeéin R. Moreover, this ¢ is the coordinate of Y with respect to X 
as unit point on the line through O and X. 


38 VECTOR SPACES 1.2 


ty—2, y—2) =|V¥\2 


8, =r} +23 
JOX |? =72= 82-423 


Fig. 1.5 


4) If the axis system in E® is Cartesian, that is, if the axes are mutually 
perpendicular and a common unit of distance is used, then the length |OX| of 
the segment OX is given by the so-called Euclidean norm on R*, |OX| = 
(x? x7)". This follows directly from the Pythagorean theorem. Then this 
formula and a second application of the Pythagorean theorem to the triangle 
OXY imply that the segments OX and OY are perpendicular if and only if the 
sealar product (x, y) = 52, x.y; has the value 0 (see Fig. 1.5). 


In applying this result, it is useful to note that the scalar product (x, y) is 
linear as a function of either vector variable when the other is held fixed. ‘Thus 


3 3 3 
(ex + dy, z) = © (ers + dydes = ¢ DO raz t+ dD ya; = (x, z) + dy, 2). 
1 1 1 


Exaetly the same theorems hold for the coordinate correspondence between 
the Euclidean plane E? and the Cartesian 2-space R?, except that now, of course, 
(x, y) = Di ways = zy + raye- 

We can easily obtain the equations for lines and 
planes in E* from these basic theorems. First, we see 
from (2) and {3} that if fixed points A and B are given, xX 
with A + O, then the line through B parallel to the 
segment OA contains the point X if and only if there 0 
exists a scalar i such that x — b = fa (see Fig. 1.6). 


Therefore, the equation of this line is hs 


x = fat+b. Fig. 1.6 


This vector equation is equivalent to the three numerical equations z; = 
at+ 6,7 = 1, 2,3. These are customarily called the parametric equations of the 
line, since they present the coordinate triple x of the varying point X on the line 
as functions of the “parameter” t. 


1.2 VECTOR SPACES AND GEOMETRY 39 


Next, we know that the plane through B perpendicular to the direction of 
the segment OA contains the point X if and only if BX LOA, and it therefore 
follows from (2) and (4) that the plane contains X if and only if (x — b, a) = 0 
(see Fig. 1.7). But (x — b, a} = {x, a) — (b, a) by the linearity of the scalar 
product in its first variable, and if we set / = (b, a), we sce that the equation of 
the plane is 


3 
(x, a} = 1 or >, ast; = 1. 
1 


That is, a point X is on the plane through B perpendicular to the direction of 
OA if and only if this equation holds for its coordinate triple x. Conversely, 
ifa + 0, then we can retrace the steps taken above to show that the set of points 
X in £* whose coordinate triples x satisfy (x, a) = [is a plane. 


Fig. 1.7 


The fact that R* has the natural scalar product (x, y) is of course extremely 
important, both algebraically and geometrically. However, most vector spaces 
do not have natural sealar products, and we shall deliberately neglect scalar 
products in our early vector theory (but shall return to them in Chapter 5). 
This leads us to seek a different interpretation of the equation D} a; = 1. 
We saw in Section 1 that x + © a,z; is the most general linear functional f on 
R*. Therefore, given any plane Af in E*, there is a nonzero linear functional f 
on R* and a number / such that the equation of Mf is f(x) = 1. And converscly, 
given any nonzero linear functional f:R* — R and any 2<R, the locus of 
f(x) = lis a plane Mf in E*. The reader will remember that we obtain the 
coefficient triple a from f by a; = f(8), since then f(x) = f(D? 2,5 = 
Lizd(s) = Li zea. 

Finally, we seek the vector form of the notion of parallel translation. In 
plane geometry when we are considering two congruent figures that are parallel 
and similarly oriented, we often think of obtaining one from the other by “sliding 


40 VECTOR SPACES 1.2 


the plane along itself” in such a way that all lines remain parallel to their original 
positions. This description of a parallel translation of the plane can be more 
elegantly stated as the condition that every directed line segment slides to an 
equivalent one. If X slides to Y and O slides to B, then OX slides to BY, so 
that OX ~ BY and x = y — b by (2). Therefore, the coordinate form of such 
a parallel sliding is the mapping x +> y = x +b. 

Conversely, for any b in R? the plane mapping defined by x > y = x +b 
is easily seen to be a parallel translation. These considerations hold equally well 
for parallel translations of the Euclidean space E°, 

It is geometrically clear that under a parallel translation planes map to 
parallel planes and lines map to parallel lines, and now we can expect an easy 
algebraie proof. Consider, for example, the plane AY with equation f(x} = J; 
let us ask what happens to Af under the translation x > y = x+b. Since 
x = y — b, we see that a point x is on Af if and only if its translate y satisfies 
the equation f(y — b) = 2? or, since f is linear, the equation f(y} = U’, where 
’ = 1+ f(b). But this is the equation of a plane N. Thus the translate of Af 
is the plane N. 

It is natural to transfer all this geometric terminology from sets in E* 
to the corresponding sets in R* and therefore to speak of the set of ordered 
triples x satisfying f(x) — las a set of poinis in R® forming a plane in R*, and 
to call the mapping x > x + b the (parallel) translation of R* through b, ete. 
Moreover, since R® is a vector space, we would expect these geometric ideas to 
interplay with vector notions. Tor instance, translation through b is simply the 
operation of adding the constant vector b:x > x +b. Thusif Af isa plane, then 
the plane N obtained by translating Jf through b is just the vector set. sum 
Af 4+-b. If the equation of Af is f(x) = 2, then the plane A¥ goes through 0 if 
and only if ? = 0, in which ease Af is a vector subspace of R® (the null space of f). 
It is easy to see that any plane JZ is a translate of a plane through 0. Similarly, 
the line {fa + b:¢ € R} is the translate through b of the line {ta : ¢ E R}, and 
this second line is a subspace, the linear span of the one vector a. Thus planes 
and lines in R® are translates of subspaces. 

These notions all carry over to an arbitrary real veetor space in a perfectly 
satisfactory way and with additional dimensional variety. A plane in R® 
through 0 is a vector space which is two-dimensional in a strietly algebraic sense 
which we shall diseuss in the next chapter, and a line is similarly one-dimensional. 
In R® there are no proper subspaces other than planes and lines through 0, 
but in a vector space V with dimension » > 3 proper subspaces occur with all 
dimensions from 1 tox — 1. We shall therefore use the term “plane” loosely to 
refer to any translate of a subspace, whatever its dimension. More properly, 
translates of vector subspaces are called affine subspaces. 

We shall see that if V is a finite-dimensional space with dimension z, then 
the null space of a nonzero linear functional f is always (rn — 1)-dimensional, and 
therefore it cannot be a Euclidean-like two-dimensional plane except when 


1.2 VECTOR SPACES AND GEOMETRY 41 


n= 3. Weuse the term hyperplane for such a null space or one of its translates. 
Thus, in general, a hyperplane is a set with the equation f(x) = i, where f is 
a nonzero linear functional. It is 2 proper affine subspace (plane) which is maxi- 
mal in the sense that the only affine subspace properly including it is the whole 
of V. In R® hyperplanes are ordinary geometric planes, and in R® hyperplanes 
are lines! 


EXERCISES 


2.1 Assuming the theorem .IB~ XY 4 b—a = y — x, show that 0€ is the sum 
of OA and OB, as defined in the preliminary discussion of Section 1, if and only if 
ec = b+ a. Considering also our assumed geometric theorem (3), show that the 
mapping xr OX from R® to the vector space of geometric vectors is linear and 
hence an isomorphism. 

2.2 Let L be the line in the Cartesian plane R? with equation x2 = 3x1. Express L 
in parametric form as x = ta for a suitable ordered pair a. 


2.3 Let V be any vector space, and let a and @ be distinct vectors. Show that the 
line through @ and § has the parametric equation 


té=B+{1—De, +teER. 


Show also that the segment from a@ to @ is the image of [0,1] in the above mapping. 


2.4 According to the Pythagorean theorem, 4 triangle with side lengths a, 5, and ¢ 
has a right angle at the vertex “opposite e” if and only if ec? = a% + 8%, 


b 


of a 


Prove from this that in a Cartesian coordinate system in E* the length |OX 
segment ON is given by 


3 
Jox?? = So xf 
1 
wherex = <21, tz, 23> ts the coordinate triple of the point X. Next use our geometric 
theorem (2) to conclude that 
3 
OX LOY ifandonlyif (x,y) =0, where (x,y) = Do rays 
1 


(Use the bilinearity of (x, y) to expand |X — ¥]*.) 
2.5 More generally, the law of cosine says that in any triangle labeled as indicated, 


ce? = a? + b? — 2ab cos 6. 
a 


b 


42 VECTOR SPACES 1,2 


Apply this law to the diagram 
x 


(x, y) = 2[x] |y] cos 8, 


to prove that 


where (x, y) is the scalar produet Sf ray, [x] = (x, x)? = [ON], ete. 

2.6 Given a nonzero linear functional f: R? — R, and given k € R, show that the 
set of points XY in E% such that fix) = 4 is a plane. [Hint: Find a hb in R® such that 
f(b) = &, and throw the equation f(x) = & into the form (x — b,a)} = Q, ete.] 

2.7 Show that for any b in R? the mapping Yt? ¥ from E® to itself defined by 
y = x+ bisa parallel translation. hat is, show that if Y¥+> Y and Zr+ W, then 
XZ~ YW, 

2.8 Let M be the set in R# with equation 32; — zg-+ 23 = 2. Find triplets a and b 
such that 4 is the plane through b perpendicular to the direction of a. What is the 
equation of the plane P = W+ <1,2,1>? 

2.9 Continuing the above exercise, what is the condition on the triplet b in order for 
N = M-+ b to pass through the origin? What is the equation of N? 

2.10 Show that if the planc M in R# has the equation f(x) = J, then M isa translate 
of the null space NV of the linear functional f, Show that any two translates Vf and P 
of N are either identical or disjoint, What is the condition on the ordered triple b 
in order that W-+ b = VM? 

2.11 Generalize the above exercise to hyperplanes in R?. 


2.12 Let N be the subspace (plane through the origin) in R? with cquation f(x) = 0, 
Let Af and P be any two planes obtained from N by parallel translation. Show that 
Q = M+ P is a third such plane. If df and ? have the equations fix) = ¢; and 
f(x) = le, find the equation for Q. 

2.13 If M is the plane in R? with equation f(x) = i, and if r is any nonzero number, 
show that the set product 7. is a plane parallel to M. 

2.14 In view of the above two exercises, diseuss how we might consider the sct of all 
parallel translates of the plane AV’ with equation f(x} = 0 as forming a new vector 
space, 

2.15 Let ZL be the subspace (line through the origin) in R? with parametric equation 
x = ta. Diseuss the set of al] parallel translates of £ in the spirit of the above three 
exercises. 


2.16 The best object to take as “being” the geometric vector 1B is the equivalence 
class of all directed line segments X¥ such that XY ~ AB. Assuming whatever you 
need from properties (1} through (4}, show that this zs an equivalence relation on the 
set of all directed line segments (Section 0.12). 

2.17 Assuming that the geometric vector AB is defined as in the above exercise, show 
that, strictly speaking, it is actually the mapping of the plane (or space) into itself that 
we have called the parallel translation through AB. Show also that AB + CD is the 
composition of the two translations. 


1.3 PRODUCT SPACES AND HOM(Y, W) 43 


3. PRODUCT SPACES AND HOM(V, F) 


Product spaces. If W is a vector space and A is an arbitrary set, then the set 
V = W4 of all W-valued functions on A is a vector space in exactly the same 
way that R4 is. Addition is the natural addition of functions, (f + g(a) = 
f(a) + gia), and, similarly, (xf}(2) = x(f(a)) for every function f and scalar z. 
Laws AJ through S4 follow just as before and for exactly the same reasons. For 
variety, let us check the associative law for addition. The equation f+ (g+h) = 
(f+) +A means that (f+ @@+))(a) = (+9) +)(a) for all ae A. 
But 


(f+ @+h))a@) = f@) + @ + A)@) 
= f(a) + (g(a) + h(a) = Ga) + g@)) + h(a) 
= (f+ g)(@) +A@) = (f+9 + h)@), 


where the middle equality in this chain of five holds by the associative law for W 
and the other four are applications of the definition of addition. Thus the 
associative law for addition holds in W4 because it holds in W, and the other 
laws follow in exactly the same way. As before, we let 7; be evaluation at 7, 
so that 7:(f) = f(®). Now, however, 7; is vector valued rather than scalar valued, 
because it is a mapping from V to W, and we call it the 7th coordinate projection 
rather than the ith coordinate functional. Again these maps are all linear. 
In fact, as before, the natural vector operations on W4 are uniquely defined by 
the requirement that the projections 7; all be linear. We call the value f(7) = 
w;(f) the jth coordinate of the vector f. Here the analogue of Cartesian n-space 
is the set W* of all n-tuples a = <a,,...,a,> of vectors in W; it is also 
designated W". Clearly, a; is the jth coordinate of the a-tuple e. 

There is no reason why we must use the same space W at each index, as we 
did above. In fact, if W,,..., W, are any n vector spaces, then the set of all 
n-tuples a = <ayj,...,@,> such that a; € W; for j = 1,...,” is a vector 
space under the same definitions of the operations and for the same reasons. 
That is, the Cartesian product W = Wy x We X-+-- X Wa is also a vector 
space of vector-valued functions. Such finite products will be very important 
to us. Of course, R® is the product J]? W; with each W; = R; but R® can also 
be considered R™ x R*—™”, or more generally, J]? W:, where W; = R™ and 
Lim; = n. However, the most important use of finite product spaces arises 
from the fact that the study of certain phenomena on a vector space V may 
lead in a natural way to a collection {V;}7 of subspaces of V such that V is 
isomorphic to the product J]? V;. Then the extra structure that V acquires 
when we regard it as the product space J]? V; is used to study the phenomena 
in question. This is the theory of direct sums, and we shall investigate it in 
Section 5. 

Later in the course we shall need to consider a general Cartesian product of 
vector spaces. We remind the reader that if {W; :¢ © I} is any indexed collection 
of vector spaces, then the Cartesian product []iez W; of these vector spaces is 


44 VECTOR SPACES 1.3 


defined as the set of all functions f with domain J such that f(z) € W; for all 
t € I {see Section 0.8). 

The following is a simple conerete example to keep in mind. Let S be the 
ordinary unit: sphere in R*, S= {x : 303 2? = J}, and for each point x on S 
let W, be the subspace of IR® tangent to S at x. By this we mean the subspace 
(plane through O) parallel to the tangent plane to S at x, so that the translate 
W, +x is the tangent plane (see Fig. 1.8). A function f in the product space 
W = []xes W, is a function which assigns to each point x on S a vector in W,, 
that is, a vector parallel to the tangent plane to S at x. Such a function js called 
a vector field on S. Thus the product set W is the set of all vector fields on S, 
and W itself is a vector space, as the next theorem states. 


Fig. 1.8 


Of course, the jth coordinate projection on W = [fies W; is evaluation 
at J, 7;(f) = f(j), and the natural vector operations on W are uniquely defined 
by the requirement that the coordinate projections all be linear. Thus f ++ ¢ 
must be that element of W whose value at j, 7;(f-+ 9), is 7j(f) + 2j(9) = 
F(9) + (3) for all 7 € J, and similarly for multiplication by scalars. 


Theorem 3.1, The Cartesian product of a collection of vector spaces can 
be made into a vector space in exactly one way so that the coordinate pro- 
jections are all linear. 


Proof. With the vector operations determined uniquely as above, the proofs of 
Al through 54 that. we sampled earlier hold verbatim. They did not require that 
the functions being added have ali their values in the same space, but only that 
the values at a given domain element ¢ all lie in the same space. [ 


Hom(V, 7). Linear transformations have the simple but important properties 
that the sum of two linear transformations is linear and the composition of two 
linear transformations is linear. These imprecise statements are in essence the 
theme of this section, although they need bolstering by conditions on domains 
and codomains. Their proofs are simple formal algebraic arguments, but the 
objects being discussed will increase in conceptual complexity. 


1.3 PRODUCT SPACES AND HOM(F, FW} 45 


If W is a vector space and A is any set, we know that the space W4 of all 
mappings f: A — W is a vector space of functions (now vector valued) in the 
same way that R“ is, If A is itself a vector space V, we naturally single out for 
special study the subset of W” consisting of all linear mappings. We designate 
this subset Hom(V¥, W). The following elementary theorems summarize its 
basic algebraic properties. 


Theorem 3.2. Hom(V, W’) is a vector subspace of W". 
Proof. The theorem is an easy formality. If S and T are in Hom(V, W), then 


(S + T)(wa + y8) = Sa + y8) + Twa + 8) 

= 28(a) + yS(B) + eT (a) + yT (8) = 2S + Tha) + y(S + T)(8), 
so S + T is linear and Hom({Y, W) is closed under addition. The reader should 
be sure he knows the justification for each step in the above continued equality. 
The closure of Hom({V, W) under multiplication by scalars follows similarly, 
and since Hom(¥, W) eontains the zero transformation, and so is nonempty, 
it is a subspace. 0 


Theorem 3.3, The composition of linear maps is linear: if 7 € Hom(V, W) 
and S € Hom(W, X), then So 7 € Hom(V, X). Moreover, composition 
is distributive over addition, under the obvious hypotheses on domains and 
codomains; 


(Sy Se)oT=S8,°T+8.0F and So(%)+72)=S8°2T,4+8o Fo. 
Finally, composition commutes with scalar multiplication: 
e(So T) = (cS)o T = Se (cT). 
Proof. We have 
So Tea + y8) = S(T(wa + y8)) = S(xT(a) + yT(8)) 
= 28(T(a)) + yS(T@)) = 28° T)(a) + WS o T)B), 
so So Tis linear. The two distributive laws will be left to the reader. 0 


Corollary. If T € Hom(V, W) is fixed, then composition on the right by T 
is a linear transformation from the vector space Hom(W, X) to the vector 
space Hom(¥, X). It is an isomorphism if 7 is an isomorphism. 


Proof. The algebraic properties of composition stated in the thecrem can be 
combined as follows: 


(e181 + 98a) 0 T = €1(81 ° T) + eg(Se 0 TF). 

So (eT, + e272) = (8 o Ti) + e2(S © Pe). 
The first equation says exactly that composition on the right by a fixed 7 is a 
linear transformation. (Write So T as 3(S) if the equations still don’t look 


right.) If 7’ is an isomorphism, then composition by 77! “undoes” composition 
by 7, and so is its inverse. 0 


46 VECTOR SPACES 1.3 


The second equation implies a similar corollary about composition on the 
left by a fixed S. 


Theorem 3.4. If W is a product vector space, W = JJ; W;, then a mapping 
T from @ vector space V to W is linear if and only if a7;¢ 7 is linear for 
each coordinate projection 7;. 


Proof. If T is linear, then 7; T is linear by the above theorem. Now suppose, 
conversely, that all the maps 7; ° T are linear. Then 


@;(T(aa + yB)) = mic Tira + yB) = x(mie T)(a) + y(mio TB) 
= xm,(T(a)) + yti(T)) = 7:(eT(@) + yT)). 


But if 2,(f) = wi(g) for all 7, then f = gy. Therefore, T(ze + y8) = eT (a) + 
yT (3), and T is linear. 0 


If T is a linear mapping from R” to W whose skeleton is {6;}”, then 7; ¢ T 
has skeleton {7;(8,;)}Jo1. If W is R”, then z; is the zth coordinate functional 
y > y,, and §; is the jth column in the matrix t = {t,;} of 7. Thus 7;(8;) = ti;, 
and 7; ° T is the linear functional whose skeleton is thezth row of the matrix of 7. 

In the discussion centering around Theorem 1.3, we replaced the vector 
equation y = J'{x) by the equivalent set of m scalar equations y; = S77, t:;2;, 
which we obtained by reading off the zth coordinate in the vector equation. But 
in “reading off” the zth coordinate we were applying the coordinate mapping 
Wz, or in more algebraic terms, we were replacing the linear nap T by the set of 
linear maps {7; ° T}, which is equivalent to it by the above theorem. 

Now consider in particular the space Hom(V, V), which we may as well 
designate ‘Hom(V)’. In addition to being a vector space, it is also closed under 
composition, which we consider a multiplication operation. Since composition 
of functions is aleays associative (sce Section 0.9), we thus have for multiplica- 
tion the laws 

Aeo(BeC) = (Ac B)oC, 

Ao(B4{C)= (Ac B) + (A°C), 

(A+ B)oC= (A°C)4 (B°O), 
k(A o B) = (kA) o B= A o (KB). 


Any vector space which has in addition to the vector operations an operation 
of multiplication related to the vector operations in the above ways is called 
an algebra. Thus, 


Theorem 3.5. Hom(V) is an algebra. 


We noticed earlier that certain real-valued function spaces are also algebras. 
Examples were R4 and @((0, 1]). In these cases multiplication is commutative, 
but in the case of Hom{V) multiplication is not commutative unless V is a 
trivial space (V = {0}) or V is isomorphic to R. We shall check this later when 
we examine the finite-dimensional theory in greater detail. 


L.3 PRODUCT SPACES AND HOM(V. ¥) 47 


Product projections and injections. In addition to the coordinate projections, 
there is 4 second class of simple linear mappings that is of basie importanee in 
the handling of a Cartesian produet space W = TJnex Wy. These are, for each 
j, the mapping 6; taking a vector a € W; to the function in the produet space 
having the value @ at the index 7 and 0 elsewhere. For example, @ for Wy X 
W >. X Ws is the mapping a <0,a,0> from W2 to W. Or if we view R? as 
R x R?, then 6 is the mapping <ao,1g> + <0, <%o, 43> > = <0, a2,23>. 
We call 0; the injection of W; into [], W;. The linearity of 6; is probably obvious. 
The mappings +; and 6; are clearly connected, and the following prajection- 
enjection identities state their exact relationship. If Z; is the identity trans- 
formation on W;, then 


w;° 0; = J; and 7rj,° 6; + 0 if @¥#}. 
If K 4s finite and J is the identity on the product space W, then 


> 0° 7, = ZT. 

kEK 
In the ease JJ#_, Wi, we have 62° %o{<a1, a,a3;>) = <0, a2,0>, and 
the identity simply says that <a@,,0,0> + <0, ac,0> + <0,0,a3> = 
<a), %2, @3> for all a), a2, a3. These identities will probably be elcar to the 
reader, and we leave the formal proofs as an cxereisc. 

The coordinate projections 7; are useful in the study of any product space, 
but because of the limitation in the above identity, the injections 6; arc of 
interest, principally in the ease of finite products. Together they enable us to 
decompose and reassemble linear maps whose domains or codomains are finite 
product spaces. 

For a simple example, consider the 7 in Hom(R*, R*) whose matrix is 


2 -1 1 
1 1 4}° 


Then 7, © T is the linear functional whose skeleton <2, —1, 1> is the first row 
in the matrix of 7, and we know that we can visualize its expression in equation 
form, ¥, = 2%; — t2 + 23, as being obtained from the vector equation y = 
i'{x) by “reading off the first row”. Thus we “decompose” 7 into the two linear 
functionals {; = w;,¢ 7’. Then, speaking loosely, we have the reassembly 
T= <f,,l,>; more exaetly, T(x) = <22) — xg+ 23,21 + to f- 4ag> = 
</1(x), fo(x) > for all x. However, we want to present this reassembly as the 
action of the lincar maps @, and 6. We have 


<E(x), a(x) > = 91(2,(x)) = |3 B{I2(x)) = (0,° 7) + 020 m2)(F(x)} = T(x), 


which shows that the decomposition and reassembly of T is an expression of the 
identity 3° 6;° 7;= J. In general, if T € Hom(¥, W) and W = J]; W;, then 
T; = w;° T is in Hom(V, W,) for each z, and T; ean be considered “the part 
of T gomg into W,”’, since 7;(@) is the 7th coordinate of Ta) for cach a. Then we 


48 VECTOR SPACES 13 


can reassemble the T;’s to form 7 again by T = 5° @;0° T;, for > 6;° T; = 
(L@:07)°T=IfeT=T. Moreover, any finite collection of T's on a 
common domain can be put together in this way to make a 7. For example, 
we can assemble an m-tuple {77 of linear maps on a common domain V to 
form a single m-tuple-valued linear map T. Given a in V, we simply define 
T (a) as that m-tuple whose ith coordinate is T;(@) for i = 1,..., m, and then 
cheek thet 7 is linear. Thus without having to calculate, we see from this 
assembly principle that T':xr> <22) — xg + 23,21 + 29 + 443> is a limear 
mapping from R® to R?, since we have formed T by assembling the two linear 
functionals 2)(x) = 2%, —2%e+ 4g and io(x) = 4, + 22+ 423 to form a 
single ordercd-pair-valued map. This very intuitive process has an cqually 
simple formal justification. We rigorize our discussion in the following theorem. 


Theorem 3.6. If 7; is in Hom(V, W,) for each 7 in a finite index set J, 
and if W is the product space [],;er W;, then there is a uniquely determined 
T in Hom(V, W) such that 7; = w; 0 T for all tin 7. 


Proof. If T exisis such that T; = 7;¢ T for each z7, then T = Ip eo T = 
(ben) ° T= F680 (x79 T) = 1 6;0 T;. Thus T is uniquely determined 
as }°6;° T; Moreover, this T does have the required property, since then 


nj°oT= m0 (D7 0 T:) = DE (4; 0 6) ° Ty = Ie T; = Ty. o 


In the same way, we can decompose a linear T whose domain is a product 
space V = []j-1 V; into the maps 7; = Te @; with domains Vj, and then 
reassemble these maps to form T by the identity T = S3., Tj ° 2; (check it 
mentally!), Moreover, a finite collection of maps into a common cedomain 
space can be put together to form a single map on the product of the domain 
spaces. Thus an -tuple of maps {7';}7 into W defines a single map T into W, 
where the domain of 7' is the product of the domains of the 7,’s, by the equation 
T(<a1,..+,427)= DK] Ta) or T= XT T;o7;. For example, if 7;:R— R? 
is the map {+ ¢<2,1> = <2t,é>, and 7. and 73 are similarly the maps 
f+ t<—1,1> and + &<1,4>, then T = 3? 7; 7; is the mapping from 
R? to R? whose matrix is 


Again there is a simple formal argument, and we shall ask the reader to write 
out the proof of the following theorem. 


Theorem 3.7, If Tj is nm Hom(V;, W) for each 7 in a finite index set J, 
and if V = JJjez V;, then there exists a unique 7' in Hom(V, W) such 
that To 6; = T; for each? in J. 


Finally we should mention that Theorem 3.6 holds for afi product spaces, 
finite or not, and states a property that characterizes product spaces. We shall 


1.3 PRODUCT SPACES AND HOM(V, P) 49 


investigate this situation in the exercises. The proof of the general case of 
Theorem 3.6 has to get along without the injections @;; instead, it isan application 
of Theorem 3.4. 

The reader may feel that we are being overly formal in using the projections 
w; and the injections @; to give algebraic formulations of proecsses that are 
easily visualized directly, such as reading off the scalar “components” of a 
vector equation. However, the mappings 


XO Xj and Bro <0,...,0,2,0,...,0> 


are clearly fundamental deviees, and making thetr relationships explicit now 
will be helpful to us later on when we have to handle their oecurrenees in more 
complicated situations. 


EXERCISES 


3.1 Show that R™ X R* is isomorphic to R’™, 

3.2 Show more venerally that if 7 x; = x, then [T[7; R” is isomorphie to R”. 

3.3 Show that if {8, C! isa partitioning of 1, then R4 and R4 & RE are isomorphic. 

3.4 Generalize the above to the case where {.1;)} partitions A. 

3.5 Show that a mapping 7 from a veclor space V to a veelor space W is linear if 
and only if (the graph of) 7 is a subspace of VX HW. 

3.6 Let S and T be nonzero linear maps from FV to W. The definition of the map 
S-4 Fis not the same as the set sum of (ihe graphs of) S and Tas subspaces of VX WW. 
Show that the set sum of (the graphs of) S and 7 cannot be a graph unless S = 7. 


3.7 Give the justification for each step of the ealeulation in Theorem 3.2. 
3.8 Prove the distributive laws given in Theorem 3.3. 


3.9 Let D:@!ifa, b)) C([a, }) be differentiation, and let S:@t{a, 4]) 3 Robe the 
(lefinite integral map fro J2 f. Compute the composition So D, 


3.10) We know that the general linear functional #’ on R? is the map x @1t)-+ ate 
determined by the pair a in R*, and that the gencral linear map 7 in Hom({R*) is 


determined by a matrix 
es fit tia 
fo1 lag 


Then #0 7 ts another linear functional, and henee is of the form x &2) -b boxe for 
some bin R?, Compute b from t and a. Your computation should show you that. 
at> bis linear. What is iis matrix? 


3.11 Given S and T in Hom{R*) whose matrices are 


sa} om fo i) 


respectively, find the matrix of So T in Hom(R?). 


50 VECTOR SPACES 1.3 


3.12 Given S and 7 in Hom(R?) whose matrices are 


*s -[* 4 and v= [i wal 
821 822 tor tga 
find the matrix of So P. 
3.13 With the above answer in mind, what would you guess the matrix of Se T is 
if S and T are in Hom(R*}? Verify your guess. 


3.14 We know thatif 7 € Hom(Y, I) isan isomorphism, then T~! is an isomorphism 
in Hom(W, V). Prove that 


So T surjective > S surjective, Se TP injective = T injective, 
and, therefore, that if T& Hom(¥, K’}, S & Hom(W, V), and 
Sof = Ty, ToS=f., 


then T is an isomorphism. 
3.15 Show that if S—! and T7~! exist, then (So 7)—! exists and equals T7!e S—!, 
Give a more careful statement of this result. 
3.16 Show that if Sand Tin Hom ¥ commute with each other, then the null space of 
7,N = N(T)}, anditsrange R = #(P) are invariant under S (S[N] CW and S[R}C R). 
3.17 Show that if @ is an eigenvector of T and S commutes with 7, then S(a} is 
an eigenvector of 7 and has the same eigenvalue. 
3.18 Show that if S commutes with 7 and 77! exists, then S commutes with 7'~!, 
3.19 Given that @ is an eigenvector of T with eigenvalue zr, show that @ is also an 
eigenvector of 7? = To T, of T, and of T~! (if T is invertible) and that the corre- 
sponding eigenvalues are x”, x", and 1/z. 

Given that p{Z) is a polynomial in ¢, define the operator p(7), and under the above 
hypotheses, show that a is an eigenvector of p(7") with eigenvalue p(x). 
3.20 If S and T are in Hom FV, we say that S doubly commutes with T (and write 
Sce T) if $ commutes with every 1 in Hom VY which commutes with T. Fix 7, and 
set {7} = {S: Sec T}. Show that {T}” is a commutative subalgebra of Hom VY. 
3.21 Given Tin Hom V and ain V, \et N be the linear span of the “trajectory of 
under T” (the set {7"a:nG Z+!). Show that N is invariant under T. 
3.22 Atransformation 7 in Hom VY such that 7* = 0 for some 7 is said to be nilpotent. 
Show that if T is nilpotent, then 7 — T is invertible. [HMint: The power series 


is a finite sum if x is replaced by T.] 

3.23 Suppose that 7 is nilpotent, that S commutes with 7, and that S—! exists, where 
S, T€ Hom VP. Show that (S — 7}—! exists. 

3.24 Let y be an isomorphism from a vector space V to 4 vector space W, Show that 
T + po Tog! js an algebra isomorphism from the algebra Hom Y to the algebra 
Hom VW’, 


13 PRODUCT SPACES AND HOM(/, IY) ol 


3.25 Show the z;’s and @,’s explicitly for R? = RX RX R using the stopped arrow 
notation. Also write out the identity }° @;° a2; = Z in explicit form, 

3.26 Do the same for R° = R? x R3. 

3.27 Show that the first two projection-injection identities (7; 8; = I;andzjo #; = 0 
if j # 7) are simply a restatement of the definition of @;. Show that the linearity of @; 
follows formally from these identities and Theorem 3.4. 

3.28 Prove the identity >> 6,0 7; = I by applying z; to the equation and remembering 
that f = g if r;(f) = w;(g) for all j (this being just the equation f(7) = g{y) for all 7). 
3.29 Prove the gencral case of Theorem 3.6. We are given an indexed collection of 
linear maps {¥%;:¢€ 2} with common domain VY and codomains {W;:71E I}. The 
first question is how to define T: F > Wo = T]; W;. Do this by defining T(£) suitably 
for each £€ V and then applying ‘Theorem 3.4 to conclude that T is linear. 

3.30 Prove Theorem 3.7. 

3.31 We know without calculation that the map 


Tix > <3z, — ze + 23,29 + 23,21 — 523, 2ap> 


from R3 to R* is linear. Why? (Cite relevant theorems from the text.) 

3.32 Write down the matrix for the transformation Tin the above example, and then 
write down the mappings 7° @; from R to R* (for ¢ = 1, 2,3) in explicit ordered 
quadruplet form. 

3.33 Let W = J]? WW; be a finite product vector space and set p; = 8;° 7;, so that 
pr isin Hom W for allé. Prove from the projection-injection identities that. cf? p; = I 
(the identity map on W), pio pj = 0 if t ¥ 7, and p;o p; = pi. Identify the range 
R; = R(pi). 

3.34 In the context of the above exercise, define T in Hom W as 


n 


>» MDn- 


mel 
Show that @ is an eigenvector of Tif and only if @ is in one of the subspaces R; and that 
then the eigenvalue of a@ is 7. 
3.33 In the same situation show that the polynomial 


I] @—s = (PT —I)o-+-0(F—al) 


is the zero transformation. 


3.36 Theorems 3.6 and 3.7 can be combined if 7 © Hom(¥, 11), where both V and W 
are product spaces: 


a 
V=J[V; and wWeI]W.. 
1 
State and prove a theorem which says that such a 7 can be decomposed into a doubly 


indexed family {7',;} when 7,; & Hom(V;, W;) and conversely that any such doubly 
indexed family can be assembled to form a single J’ form V to W. 


52 VECTOR 5PACES 14 


3.37 Apply your theorem to the special case where V = R” and W = R” (that is, 
V; = Wy; = R for alli and 7). Now T;; is from R to R and henee is simply multipli- 
cation by a number ¢;;. Show that the indexed collection {é;;} of these numbers is the 
matrix of 7. 
3.38 Given an m-tuple of vector spaces {]¥':} 7", suppose that there are a vector space 
NX and maps p;in Hom(X, Wy), 7 = 1,...,m, with the following property: 

P. For any m-tuple of linear maps {7;) from a common domain space V to the 

above spaces HW; {so that 7; € Hom(¥, Wy), 7 = 1,..., 2), there is a unique 7 

in Hom(¥, X) such that 7; = p;o 7,2 = 1,...,m. 
Prove that there is a “canonical” isomorphism from 

we=[]Ww. te x 
1 

under which the given maps p; become the projections 7;. [Remark: The product space 
IW itself has property P by Theorem 3.6, and this exercise therefore shows that P is an 
abstract characterization of the product space.) 


4. AFFINE SUBSPACES AND QUOTIENT SPACES 


In this section we shall look at the “planes” in a vector space V and see what 
happens to them when we translate them, intersect them with each other, 
take their images under linear maps, and so on. Then we shall confine ourselves 
to the set, of all planes that are translates of a fixed subspace and discover that 
this sct itself is a veetor space in the most obvious way. Some of this material 
has been anticipated in Section 2. 


Affine subspaces. If V is a subspace of a vector space VY and a is any veetor 
of ¥, then the sect N+a@= {£+a:£EN} is called cither the coset of N 
containing a or the affine subspace of V through a and parallel to N. The set VN + @ 
is also called the translate of N through a. We saw in Section 2 that affine sub- 
spaces are the general objeets that we want to eall planes. If NV is given and fixed 
in @ discussion, we shall use the notation a = N + @ (see Section 0.12), 

We begin with a list of some simple properties of affine subspaces. Some of 
these will generalize observations already made in Section 2, and the proofs of 
some will be left as exercises. 

1) With a fixed subspace N assumed, if ¥ € & then ¥ = & For if ¥ = 
a+ yp, then¥ + y=a4+ (yo +79) €&,s0¥Ccae Alsoat x= ¥+(y— a) EY, 
soacy. Thus & = ¥. 

2) With N fixed, for any « and 8, cither a = # or & and @ are disjoint. 
For if a and # are not disjoint, then there exists a ¥ in each, and a = ¥ = 8 
by (1). The reader may find it illuminating to compare these calculations with 
the more general ones of Section 0,12. Here a ~ § if and only ife — BEN, 

3) Now let @ be the collection of ai! affine subspaces of V; @ is thus the set 
of all cosets of all vector subspaces of V. Then the intersection of any sub- 


{4 AFFINE SUBSPACES ANI} QUOTIENT SPACES 33 


family of @ is cither empty or itself an affine subspace. In fact, if [Ay}ie; is 
an indexed collection of afiue subspaces and A; is a cosct of the vector subspace 
W; for each 7 € 7, then Nez A;is either empty or a cosct of the vector subspace 
Nicer W;. For if 8 € Nyer Az, then (1) implies that A; = 6 + W; for all 7, and 
then NA; = 6+ AW. 

4) If A,Bee, then A+ B8€@. That is, the sct sum of any two affine 
subspaces is itself an affine subspace. 


5) If A €@ and Tf € Hom({V, W), then 7[A] is an affine subspace of W. 
In particular, if ¢€ R, then (A € @. 

G) If B is an affine subspace of W and T € Hom(V, IW), then 77 )[B] is 
either empty or an affine subspace of V. 


7) For a fixed a€ V the translation of VF through a is the mapping 
S_: V — V defined by Sa(f) = + a forall €€ V. Translation is net linear; 
for example, S,(0) = a It is clear, however, that translation carries affine 
subspaces into affine subspaces. Thus $,(4) = A +a and & (8+ W) = 
(a+ s)+W. 

8) An affine transformation from a vector space V to a veetor space W is a 
linear mapping from V to W followed by a translation in W. Thus an affine 
transformation is of the form £— T(£) + 8, where 7 € Hom(V, W) ands e W. 
Note that § > T(t + a) is affine, since 


Tiét+a)=T(+8, where §= T(q). 


It follows from (5) and (7) that an affine transformation carries affie 
subspaces of V into affine subspaces of HW. 


Quotient space. Now fix a subspace N of V, and consider the set W of all 
translates (cosets) of VN. We are going to see that W itself is a veetor space in 
the most natural way possible. Addition will be set addition, and scalar multipli- 
cation will be sct multiplication (except in one special case). For example, if V 
is a line through the origin in R°, then W consists of all lines in R? parallel to N. 
We are saying that this set of parallel lines will automatically turn out to bea 
vector space: the set sums of any two of the lines in W turn out to bea linein W! 
And if 2 & W and é = 0, then the set product (Z isa line in W. The translates 
of E fiber R’, and the set of fibers is a natural vector space. 

During this discussion it will be helpful temporarily to indicate set sums by 
‘1.’ and set products by ‘,’. With AW’ fixed, it follows from (2) above that two 
cosets are disjoint or identical, so that the set W of all cosets is a fibering of ¥ 
it the general ease, just as it was in our example of the parallel lines. From (4) 
or by a direct calculation we know that 4+, 8 = a+. Thus W is closed 
under set addition, and, naturally, we take this to be our operation of addition 
on W. That is, we define + on W by a+ 8= 4+, 8. Then the natural map 
m:at+& from V to W preserves addition, m(a + 8) = m(a) + 7(8), since 


54 VECTOR SPACES 14 


this is just our equation a@ +f8= 4 |-@ above. Similarly, if ER, then the 
set product ¢-, is cither éa or {0}. Hence if we define fa as the set product when 
tO and as 0 = N when é= 0, then 7 also preserves scalar multiplication, 


w(ta) = tw(a). 


We thus have two veetorlike operations on the set W of all cosets of N, 
and we naturally expect W to turn out to be a vector space. We could prove this 
by verifying all the laws, but it is more elegant to notice the general setting for 
such a verification proof. 


Theorem 4.1, Let Vo be a veetor spacc, and Ict W be a set having two 
vectorlike operations, which we designate in the usual way. Suppose that 
there exists a surjective mapping 7: V¥ — W which preserves the operations: 
T (sx + 18) = sT(a) + (T(B). Then W is a vector space. 


Proof. We have to check laws Al through St. However, one example should 
make it elear to the reader how to proeced. We show that 7'(0) satisfies A3 and 
henee is the zero vector of IW. Since every 8 € W is of the form T(a@), we have 


F(0) + 8 = TO) + Pla) = TO + a) = Ta) = 8, 


which is A3. We shall ask the reader to check more of the laws in the exercises. J 


Theorem 4.2. The set of cosets of a fixed subspace NV of a vector space V 
themselves form a veetor space, called the quotient space V/N, under the 
above natural operations, aud the projection m is a surjective linear map 
from V to V/N, 


Theorem 4.3, If T is in Hom({V, W), and if the null space of 7 ineludes the 
subspace Af CV, then T has a unique factorization through V/Af. That is, 
there exists a unique transformation S in Hom(V/As, W) such that T = 
Som. 


Proof. Since T is zero on J, it follows that T is constant on each coset A of M, 
so that T/A] contains only one vector. If we define S(A) to be the unique 
vector in T[A], then S(@) = Tia), so Se am = T by definition. Conversely, if 
T= Rog, then R(@) = Re x(a) — Ta), and Ris our above 8. The linearity 
of S is practically obvious. Thus 


S(@+ 8 = S@+F8) = T+) = Ta) + T(8) = S@ +580, 


and homogeneity follows similarly. This completes the proof. 0 


One more remark is of interest here. If N is invariant under a linear map 
T in Hom V (that ts, T[N] CN), then for each a in V, T[&] is a subset of the 
coset Tle), for 


Pla] = Tla + N] = Ta) +; TIN] C Ta) +; N = Ta). 


~ 


14 AFFINE SUBSPACES AND QUOTIENT SPACES bi 


There is therefore a map S:V/N — V/N defined by the requirement that 
St) = Ta) (or So r = wo T), and it is easy to check that S is linear. There- 
fore, 


Theorem 4.4, Ifa subspace N of a vector space V is carried into itself by a 
transformation 7 in Hom V, then there is a unique transformation S in 
Hom(V/N) such that Soa = wo Pf. 


EXERCISES 


4.1 Prove properties (4}, (5), and (6) of affine subspaces. 


4.2 Choose an origin @ in the Euclidean plane E? (your sheet of paper), and let 
1, and Lz be two parallel lines not containing O. Let X and ¥ be distinet points on 
i, and Z any point on La. Draw the figure giving the geometric sums 


OX+0Z and OY+02 


{parallelogram rule), and state the theorem from plane geometry that says that these 
two sum points are on a third line Lg parallel to Z1 and Le. 

4.3 a) Prove the associative law for addition for Theorem 4.1. 

b) Prove also laws A4 and 82. 

4.4 Return now to Exercise 2.1 and reexamine the situation in the light of Theorem 
4.1, Show, finally, how we really know that the geometric vectors form a veetor space. 

4.5 Prove that the mapping S of Theorem 4.3 is injective if and only if WV is the 
null space of 7. 


4.6 We know from Exercise 4.5 that if 7 is a surjective element of Hom(V, W) and 
N is the null space of 7, then the S of Theorem 4.3 is an tsomerphism from V/N to W. 
Itsinverse S~! assigns a coset of V to cach qin 1’, Show that the process of “indefinite 
integration” is an example of such a map S~!, This is the process of calculating an 
integral and adding an arbitrary constant, as in 


f sina de = —cosz--e. 


4.7 Suppose that N and M are subspaces of a vector space V and that NC M. 
Show that then Af/N is a subspace of V/N and that V/ if is naturally isomorphic to the 
quotient space (V/N)/(M/N). [Hint: Every coset of N is a subset of some coset of M.} 

4.8 Suppose that N and Af are any subspaces of a vector space V. Prove that 
(VM + N)/N is naturally isomorphic to A£/(Af ON). (Start with the fact that each 
coset of Af Q N is included in a unique coset of NV.) 

4.9 Prove that the map S of Theorem 4.4 is linear. 

4.10 Given T € Hom Y, show that T? = 0 (fF? = To Tf) if and only if R(T) C N(T). 
4.11 Suppose that 7 © Hom V and the subspace WN are such that T is the identity 
on W and also on V/N. The latter assumption is that the S of Theorem 4.4 is the 
identity on V/N. Set R = T — I, and use the above exercise to show that R? = 0. 
Show that if 7 = [+ Rand R? = 0, then there is a subspace N such that T is the 
identity on NV and also on V/N. 


56 VECTON SPACES 1.5 


4.12 We now view the above situation a little differently. Supposing that T is the 
identity on N and on J'/X’, and setting R = J — T, show that there exists a 
K€ Hom(V/N, V) such that &# = A ow. Show that for any coset A of N the action 
of T on A ean be viewed as translation through A{A}. Thatis,if&€ A andy” = A(1), 
then 7(é) = &-+ 9. 

4.13 Consider the map T: <21, 22> > <21 + 2re, r2> in Hom R?, and let NV be 
the null space of R = 7 — I. Identify MV and show that T is the identity on NV and 
on R2/N. Find the map K of the above exercise. Such a mapping T is called a shear 
transformation of V parallel to NV. Draw the unit square and its image under 7, 

4.14 If we remember that the linear span Z(.1) of a subset .1 of a veetor space V can 
be defined as the intersection of all the subspaces of V that include .1, then the fact 
that the intersection of any collection of affine subspaces of a vector space V’ is cither 
an affine subspace or empty suggests that we define the affine span MA) of a nonempty 
subset AC ¥ as the intersection of all affine subspaces including 1. ‘Then we know 
from (3) in our list of affine properties that 1/(.!) is an affine subspace, and by its 
definition above that it is the smallest affine subspace including A. We now naturally 
wonder whether M(A) can be directly described in terms of linear combinations. 
Show first that ifa € A, then M(A) = L(A — a) + @; then prove that 4%(.1) is the 
set of all linear combinations >> x,; on .{ such that D> a; = 1. 

4.15 Show that the linear span of a set B is the affine span of BU {0}. 


4.16 Show that W(A + 7) = C1) + ¥ for any Yin V and that (2.4) = x3f(A) 
for any xin R. 


5. DIRECT SUMS 


We come now to the heart of the chapter. It frequently happens that the study 
of some phenomenon on a veetor space V’ leads to a finite collection of subspaces 
{V;} such that V is naturally isomorphie to the product space J]; V;. Under 
this isomorphism the maps @;° 7; on the product space become certain maps 
P; in Hom V, and the projection-injection identities are reflected in the identities 
SP; = 1, P;° P; = P; for all 7, and P;> P; = Citi # 7. Also, V; = range 
P;. The product structure that ¥ thus aequires is then used to study the phe- 
nomenon that gave rise to it. For example, this is the way that we unravel the 
structure of a linear transformation in Hom Y, the study of which is one of the 
central problems in linear algebra. 


Direct sums. If V,,..., V2 are subspaces of the vector space V, then the 
mapping 9: <a),.-.,@n> + dj a: is a linear transformation from [JT V; 
to V, since it is the sum 7 = 3°T x; of the coordinate projections. 


Definition. We shall say that the Vs are independent if 3 is injective and 
that V is the direct sum of the ¥;’s if 7 is an isomorphism. We express the 
latter relationship by writing V = V; ®--- @V, = GEV: 


Thus ¥ = @iL, V; if and only if is injective and surjective, ie., if and 
only if the subspaces {V’;}7 are both independent and span I’. A useful restate- 


1.5 DIRECT SUMS 57 


ment of the direct sum condition is that each a € V is uniquely expressible as 
asumn >°j a;, with a; & V; for all 7; w has seme such expression because the ¥,’s 
span VY, and the expression is unique by their independence. 

For example, let V = @(R) be the space of real-valued continuous functions 
on R, let ¥, be the subset of even functions (functions f such that f(—x) = f(x) 
for all x), and let V, be the subset of odd functions (functions such that f(—2) = 
—f(x) for all z). It is clear that V, and V, are subspaces of V, and we claim that 
V=V, 8 V,. To see this, note that for any fin V, g(z) = (f(z) + f(—x))/2 
is even, A(x) = (f(z) — f(—x))/2 is odd, and f= y¢ +h. Thus V = V,+ Vy. 
Moreover, this decomposition of f is unique, for if f = gy; + 4; also, where gy; 
is even and hk, is odd, then g — y; = hk: — A, and therefore gy — g) = O= 
hy, — h, since the only function that is both even and odd is zero. The even-odd 
components of e” are the hyperbolic cosine and sine functions: 


ft im ae & = a cosh x -|- sinh x. 


Since 7 is injective if and only if its null space is {0} (Lemma 1.1), we have: 


Lemma 5.1. The independence of the subspaces [V;} 7 is equivalent to the 
property that if a; € V; for all 7 and }3 a; = 0, then a; = 0 for all @. 


Corollary. If the subspaces {V;}7} are independent, a; € V; for all z, and 
7 @; is an clement of V;, then a; = 0 for i ¥ 7. 


We leave the proof to the reader. 
The case of two subspaces is particularly simple. 


Lemma 5.2. The subspaces Jf and N of V are independent if and only if 
MonN = (or. 
Proof. Tae MW, 8eN,anda+8=0,theona = —GEAM ON. WAAN = 
{0}, this will further imply that a = 6 = 0, so Af and MN are independent. 


On the other hand, if 0 + BE AI ON, and if we set a = —8, then a & Jf, 
feEN,and «+8 = 0,s0 If and N are not independent. 0 


Note that the first argument above is simply the general form of the unique- 
ness argument we gave earlier for the even-odd decomposition of a function 
on R. 


Corollary, V = M @ N if and only if V = Af +N and AL NN = {0}. 


Definition. If V = M @ N, then M and N are called complementary sub- 
spaces, and each is a complement of the other. 


Warning: A subspace M of V does not have a unique complementary subspace 
unless Af is trivial (that is, Af = {0} or M = V). If we view R® as coordinatized 
Kuclidean 3-space, then M is a proper subspace if and only if Af is a plane con- 
taining the origin or M is a line through the origin (see Fig. 1.9). If Af and N are 


or 


58 VECTOR SPACES L 


Fig. 1.9 


proper subspaces one of which is a plane and the other a line not lying in that 
plane, then 1 and N are complementary subspaces. Moreover, these are the 
only nontrivial complementary pairs in R*. The reader will be asked to prove 
some of these facts in the exercises and they all will be clear by the middie of 
the next chapter. 


The following lemma is technically useful. 


Lemma 5.3. Hf Vy and Vo are independent subspaces of V and {V,}3 are 
independent subspaces of Vo, then {V;}7 are independent subspaces of V. 


Proof. Tf a; & V; for all i and 3°} a; = 0, then, setting ag = 373 a;, we have 
a; + ag = 0, with ag E Vo. Therefore, ay = a9 = 0 by the independence of 
Vy and Vp. But then ag = ag = +++ = a, = 0 by the independence of 
{V;}}, and we are done (Lemma 5.1). 0 


CoroHary. V= V; @ Vo and Vo = Mh. Vi together imply that 
Vo Bia1 Vi. 


Projections. If V = @j_, Vi, if w is the isomorphism <aj,...,a@,7 0 
a= Dya;, and if a; is the jth projection map <a,,...,a,> + a; from 
Iifer Vi to V;, then (x; ¢ x7!)(a) = ay. 


Definition. We call a; the jth component of a, and we call the linear map 

P; = w;° a—' the projection of V onto V; (with respect to the given direct 

sum decomposition of V). Since each « in V js uniquely expressible as a sum 

a = 7 a, with a, in V; for all 7, we can view P;(a) = a; as “the part of 

« in V; cs 

This use of the word “projection” is different from its use in the Cartesian 
product situation, and each is different from its use in the quotient space cou- 
text (Section 0.12). It is apparent that these three uses are related, and the 
ambiguity causes little confusion since the proper meaning is always clear from 
the context. 

Theorem 5.1, If the maps P; are the above projections, then range P; = V;, 

P;o P; = 0 fori ¥ j, and 04 P; = 7. 


Proof. Since @ is an isomorphism and P; = m;° 7~*, we have range P; = 
range 7; = V;. Next, it follows directly from the corollary to Lemma 5.1 that 


1 


~ 


1.5 DIRECT SUMS 59 


ifa € V;, then P,{a) = O for 7 # 7, and so P;¢ P; = 0 for i #7. Finally, 
V7 P:= Cigpew! = (Via7)on ! = rer! = J, and we are done. 0 


The above projection properties are clearly the reflection in V of the pro- 
jection-injection identities for the isomorphic space JJ} V:. 
A converse theorem is also frue. 


Theorem 5.2. H {P;} C Hom V satisfy 37 P; = I and P;¢ P; = 0 for 
i #j, and if we set V; = range P,, then V = QL, Vi, and P; is the 
corresponding projection on V;. 


Proof. The equation « = I(a@} = 7 Pile) shows that the subspaces {V;37 
span V. Next, if 8 € V;, then P;(8) = 0 for i ¥ 7, since 8 E range P; and 
P;oP;=0 if t# 7. Then also P,;(8) = UF — dia; P)@) = 70) = 8. 
Now consider a = 377 a; for any choice of a; € V;. Using the above two facts, 
we have Pj{a) = P;}(02y a) = LL, P;{a;) = «;. Therefore, a = 0 implies 
that a; = P;(0} = 0 for all 7, and the subspaces V; are independent. 
Consequently, ¥ = G7 V;. Finally, the fact that oa = } P;(a) and Pia) EV; 
for ali shows that P;(«) is the jth component of a for every a and therefore that 
P; is the projection of V onto ¥;. 0 


There is an intrinsic characterization of the kind of map that is a projection. 


Lemma 5.4, The projections P; are idempotent (P? = P,), or, equivalently, 
each is the identity on its range. The null spaee of P; is thesum of the spaces 
V; for j wt. 


Proof. P? = P;° UI — Siaj Pi) = Pj° 1 = P;. Since this can be rewritten 
as P;(P;(a)} = P;(a) for every a in V, it says exactly that P; is the identity 
on its range. 

Now set W; = S;4: V;, and note that if 8 <& W,, then P,(8) = 0 since 
PIV ;] = 0 forg + 7. Thus W; Cc N(P;}. Conversely, if P(e) = 0, then a = 
fe) = SU Pyle) = Sjpai Pile) EC W;. Thus NP; C W;, and the two spaces 
are equal. 0 


Conversely: 


Lemma 5,5, If P € Hom({V) is idempotent, then V is the direct sum of its 
range and null space, and P is the corresponding projection on its range. 


Proof. Setting Q = I — P, we have PQ= P — P? = 0. Therefore, V is the 
direct sum of the ranges of P and Q, and P is the corresponding projection on its 
range, by the above theorem. Moreover, the range of @ is the null space of P, 
by the corollary. U 


HV =M @ N and P is the corresponding projection on M, we call P the 
projection on M along N. The projection P is not determined by M alone, since 
M does not determine N. A pair P and Q in Hom V such that P+ Q@ = J and 
PQ = QP = O is called a pair of complementary projections. 


60 VECTOR SPACES 1.5 


In the above discussion we have neglected another fine point. Strictly 
speaking, when we form the sum * = 0) i, we are treating each 7; as though 
it were from [Jj V; to V, whereas actually the codomain of 7; is V;. And we 
want P; to be from V to V, whereas x; ° 7! has codomain V,, so the equation 
P;= 7;°77' can’t quite be true either. To repair these flaws we have to 
introduce the injection «;: V; — V, which is the identity map on V,;, but whieh 
views V; as a subspace of V and so takes V as its codomain. If our concept of a 
mapping includes a codomain possibly larger than the range, then we have to 
admit such identity injections. Then, setting #; = t; ° 7;, we have the correct 


equations 7 = 3] #; and P; = #;° 17!. 


EXERCISES 


5.1 Prove the corollary to Lemma 5.1. 


5.2 Let @ be the vector <1,J,1> m R?*, and let Mf = Re be its one-dimensional 
span. Show that each of the three coordinate planes is a complement of Af, 


5.3 Show that a finite product space V = J[7 V; has subspaces {W237 such that 
W; is isomorphic to V; and V = @i W. Show how the corresponding projections 
{P;: are related to the w,’s and @;'s. 


5.4 If Te Hom(V, W), show that (the graph of) T is a complement of W’ = 
{0} kX Win VX W. 

5.5 If2isa linear functional on V @ € Hom(V, R) = 1%), and if @ is & veetor in F 
such that d(a) = 0, show that F = N ® iV, where N is the null space of /and 4 = Ra 
is the linear span of a. What does this result say about complements in R?? 


5.6 Show that any complement Af of a subspace ¥ of a vector space V is isomorphic 
to the quotient space V/N. 


5.7 We suppose again that every subspace has a complement. Show that if 
T € Hom F is not injective, then there is a nonzero Sin Hom V such that To S = 0. 
Show that if 7 € Hom Y is noé surjective, then there is a nonzero Sin Hom ¥ such 
that So T = 0. 

5.8 Using the above exercise for half the arguments, show that f€ Hom V is 
injective ifandonlyif fo S = 0= S = Cand that T is surjective if and only if So T = 
O=S =0. We thus have characterizations of injectivity and surjeetivity that are 
formal, in the sense that they do not refer to the fact that S and T are transformations, 
but refer only to the algebraic properties of S and 7 as elements of an algebra. 


5.9 Let M and WV be complementary subspaces of a vector space V, and let Y bea 
subspace such that YN = {0}. Show that there is a lincar injection from X to M. 
[7#int: Consider the projection P of F onto VW along NV.) Show that any two comple- 
ments of a subspace N are isomorphic by showing that the above injection is surjective 
if and only if XY is a complement of NV. 


5.10 Going back to the first point of the preceding exercise, let 1) be a complement of 
P(X] in M. Show that ¥ 9 ¥ = {0} and that Y ® ¥ isa complement of N. 

5.11 Let Af be a proper subspace of V, and let fa::¢€ I} be a finite set in V. Set 
L = L(fax}), and suppose that M+ £ = V. Show that there is a subset J C 7 such 


1.5 DIRECT SUMS 61 


that fa;;7 € J} spans a complement of VW. (Jiint: Consider a largest possible subset J 
such that OL (lags) = {0).] 
5.12 Given T € Hom(V, W) and S € Hom({IW, NX}, show that 

a) So Tis surjective & S is surjective and RiT) + NGS) = W; 

b) So T is injective @ Tis injective and R(P} MNCS) = 01; 

¢) Se Tisan isomorphism + Six surjective, Tis injective, and IW = R(T) © VCS). 
5.13 Assuming that every subspace of V has a complement, show that 7 & Hom J° 
satisfies T? = Oif and only if V hasa direct sum decomposition V = Jf ® N such that 
P = Oon N and TLMICN. 
5.14 Suppose next that 7? = 0 but 7? 4 0. Show that V ean be written as Vo = 
Fi © V2 @ Vu, where P(V5] C V2, PLIV2]C Vs, and T = 0 on Vy. (Assume again 
that any subspace of a veelor space has a complement.) 
$.15 We now suppose that 7” = 0 but Te-! #0. Set. Ny = null space (7) for 
d= 1,...,2 - 1, and let Vy be a complement of N,-1 in V. Show first that 


TIVAIN Naz = {0} 


and that 7[V]C Na-i. Extend TE¥i] lo a complement Ve of Naog in Nat, and 
show that in this way we can construct subspaces 1),..., 17, such that 


V = a ry, TF c Vea for i< n, 
1 


anrl 
TV 2] = {0}. 


On solving a linear equation. Many important problems in mathematics are 
in the following general form. A linear operator T: ¥ — W’ is given, and for a 
given 7 © W the equation T() = 7 is to be solved for £ € V. In our terms, the 
condition that there exist a solution is exactly the condition that » be in the 
range space of 7. In special circumstances this condition can be given more or 
less useful equivalent alternative formulations. Let us suppose that we know 
how to recognize R(T), in which case we may as well make it the new codomain, 
and so assume that 7 is surjective. There still remains the problem of determin- 
ing what we mean by solving the equation. The universal principle running 
through all the important instances of the problem is that a solution process 
calculates a right inverse to 7, that is, a linear operator S: W — V such that 
ToS = fy, the identity on W. Thus a solution process picks one solution 
vector ¢ & V for each » € W in such a way that the solving ¢ varies linearly with 
y. Taking this as our meaning of solving, we have the following fundamental 
reformulation, 


Theorem 5.3. Let T be a surjective linear map from the vector space V 
to the vector space W, and let N be its null space. Then a subspace Mf is a 
complement of NV if and only if the restriction of T to Af is an isomorphism 
from M to W. The mapping Af + (T f Af)—! is a bijection from the set 
of all such complementary subspaces Af to the set of all linear right inverses 
of T. 


62 VECTOR SPACES 1.5 


Proof. It should be clear that a subspace M is the range of a linear right inverse 
of T (a map S such that T° § = Iw) ifand only if T [ Af isan isomorphism to W, 
in which case S = (T | Af)—}. Strictly speaking, the right inverse must be from 
W to V and therefore must be R = ty 0 S, where ta is the identity injection 
from M to V. Then (Ro TY = Ro(ToRSoT = ReolwoT = Ro T, and 
Re T is a projection whose range is Af and whose null space is N (since # is 
injective). Thus V = M @ NW. Conversely, if V = M @ N, then Tf M is 
injective because Af 7 N = {0} and surjective because Af + N = V implies 
that Wo = TV] = Taf + N] = Tia] + TIN] = Tag] + {0} = Ti]. O 


Polynomials in T. The material in this subsection will be used in our study of 
differential equations with constant coefficients and in the proof of the diagonal- 
izability of a symmetric matrix. In linear algebra it is basic in almost any 
approach to the canonical forms of matrices. 

If pi(t) = 7 at’ and po(t) = D3 b,f? are any two polynomials, then their 


product is the polynomial 
men 


+" 
pt) = pilt)pe(t) = > ext, 


where cp = Dis4jaz 0; = Shy ady_;. Now let T be any fixed element of 
Hom(¥), and for any polynomial g(f) let ¢g(T) be the transformation obtained 
by replacing ¢ by T. That is, if g(é) = To et*, then g(7) = 5 oF", where, of 
course, T’ is the composition product To T'o---o T with I factors. Then the 
bilinearity of composition (Theorem 3.3) shows that if p(t) = pi(t)peté), 
then p(T) = pi(f)°¢ po{T). In particular, any two polynomials in T commute 
with each other under composition. More simply, the commutative law for 
addition implies that 


if pi) = plO@+ pol}, then p(?) = pi(T) + p(T). 


The mapping p(t) > p(T) from the algebra of polynomials to the algebra 
Hom(V) thus preserves addition, multiplication, and (obviously) sealar multipli- 
cation. That is, it preserves all the operations of an algebra and is therefore 
what is called an (algebra) homomorphism. 

The word “homomorphism” is a general term describing a mapping @ 
between two algebraic systems of the same kind such that @ preserves the 
operations of the system. Thus a homomorphism between vector spaces is 
simply a linear transformation, and a homomorphism between groups is a 
mapping preserving the one group operation. An accessible, but not really 
typical, example of the latter is the logarithm function, which is a homomorphism 
from the multiplicative group of positive real numbers to the additive group of R. 
The logarithm function is actually a bijective homomorphism and is therefore 
& group isomorphism. 

Tf this were a course in algebra, we would show that the division algorithm 
and the properties of the degree of a polynomial imply the following theorem. 
(However, see Exercises 5.16 through 5.20.) 


15 DIRECT SUMS 63 


Theorem 5.4. If p,() and p2(t) are relatively prime polynomials, then there 
exist polynomials @,(i) and @2(t) such that 


41(t)pi@) + aa(t)pe(t) = 1. 


By relatively prime we mean having no common factors except constants. 
We shall assume this theorem and the results of the discussion preceding it in 
proving our next theorem. 

We say that a subspace M C V is invariant under T € Hom(V) if T[M] c Af 
(that is, T | M © Hom(As)]. 


Theorem 5.5, Let T be any transformation in Hom Y, and let ¢ be any 
polynomial. Then the null space N of g(T) is invariant under 7, and if 
@ = 9142 is any factorization of g into relatively prime factors and N, and 
Nz are the null space of g,(T) and ¢2(T), respectively, then V = N, @ Ne. 


Proof. Since To ¢(T) = 9(T) © T, we see that if ¢(T){a) = 0, then g(T)(Ta) = 
T{q(T)(a)) = 0, so TIN] CN. Note also that since ¢(T) = qi(T) ° q2(T), 
it follows that any a in N2 is also in N, so N2 CN. Similarly, 4, CN. We can 
therefore replace V by N and T by 7 | N; hence we can assume that T © Hom N 
and q(T) = qi(P) ° q2(T) = 0. 

Now choose polynomials a, and @2 so that 191 + @2gz = 1. Sincepr p(T) 
is an algebraic homomorphism, we then have 


ay(T) ¢ q(T) + ao(T) ° g2(T) = 7. 


Set A, = a,(T), etc., so that A; °Q) + A2o@2 = 7,Q1 °Q2 = 0, and all the 
operators A;, Q; commute with each other. Finally, set P; = A;°Q; = Q;° Ai 
fori = 1,2. Then Py; + Pe = Fand Pi Pe = P2P, = 0. Thus P; and P2 are 
projections, and N is the direct sum of their ranges: V = V,; @ We. Since each 
range is the null space of the other projection, we can rewrite this as N = 
N, @ No, where N; = N(P,). It remains for us to show that N(P;} = N(Q,). 
Note first that since Q@; °e Pz = Q, °Q.° Az = 0, we have 9; = Q,°I = 
Q,° (PP, + Pe) = Q,° Py. Then the two identities P; = A;° Q; and Q; = 
Q; ° P; show that the null space of each of P; and Q; is included in the other, and 
so they are equal. This completes the proof of the theorem. U 


Corollary. Let p(t) = []%, p(t) be a factorization of the polynomial 
p(é) into relatively prime factors, let T be an element of Hom(¥), and set 
N; = N(p,(T)) fori = 1,..., mand N = N(p(T)). Then N and all the 
WN; are invariant under T, and N = @f, Nz. 


Proof. The proof is by induction on m. The theorem is the case m = 2, and if 
we set q= [[% p(t) and M = N(q(T)), then the theorem implies that 
N = N, @ M and that N, and M are invariant under T. Restricting T to M, 
we see that the inductive hypothesis implies that M = @fL, N; and that N; is 
invariant under T for = 2,...,. The corollary to Lemma 5.3 then yields 
our result. 0 


G4 VECTOR SPACES 1.5 
EXERCISES 


5.16 Presumably the reader knows (or ean sec) that the degree a(P) of a polynomial 
P satisfies the laws 
a(P-| Q) < max {d(P), d(Q)}, 
d(P-Q) = d(P)+ d(Q) if both P and Q are nonzero, 


The degree of the zero polynomial is undefined. (It would have to be —*!) By induc- 
tion on the degree of P, prove that for any two polynomials P and D, with D =~ 0, 
there are polynomials Q@ and R such that P = DQ-+ R and d(R) < ad(D) or R = 0. 
[ffind: Te dP?) < dD), we can take Qand Ras what? If d(P) > d{)), and if the lead- 
ing terms of P and J are ax” and bz", respeetively, with » > a, then the polynomial! 


pr=P— (2) z""D 


has degree less than d(P), so P’ = DQ’ + R’ by the inductive hypothesis. Now finish 
the proof.] 


5.17 Assuming the above result, prove that R and @ are uniquely determined by 
Pand D. (Assume also that P = DQ’ + R’, and prove from the properties of degree 
that R’ = Rand Q’ = Q.) Thesc two results together constitute the division algorithm 
for polynomials. 


5.18 If P is any polynomial 
P(a) = me Apt”, 
0 


and if ¢is any number, then of course P(f is the number 
n 
>» at". 
0 


Prove from the division algorithm that for any polynomial P and any number ¢ there 
is a polynomial Q such that 

P(x) = (ze - HQ(z) 4- PW, 
and therefore that P(x) is divisible by (« — #) if and only if P(@) = 0. 


3.19 Let P and Q be nonzcro polynomials, and choose polynomials ip and Bo such 
that among all the polynomials of the form 4P 4- BQ the polynomial 


D = ApP+ BoQ 


is nonzero and has minimum degrec. Prove that D is a factor of both P and Q. (Sup- 
pose that D does not divide P and apply the division algorithm to get a contradiction 
with the choice of .ig and Bo.) 

5.20 Let P and Q be nenazcro relatively prime polynomials. This means that if £ is a 
common factor of P and Q(P = FP’,Q = EQ’), then FE is a constant. Prove that 
there are polynomials 4 and 8 such that A(x) P(z) + Biz)P(2) = 1. (Apply the 
above exercise.) 


1.5 DIRECT SUMS 65 


5.2E In the context of Theorem 5.5, show that the restriction of qo(T) = Gea to Ny 
is an isomorphism (from NV’, to Ny). 
3.22 An involution on V is a mapping T © Hom V such that 7? = /, Show that if 
T is an involution, then F is a direct sum ¥ = V) @® Fo, where 7(£) = € for every 
fey (TF = fon Vy) and T(E) = —E for every FS Ve (PT = —I on Fa). (Apply 
Theorem 5.5.) 
3.23 We noticed carlicr Gin an exercise) that if ¢ is any mapping from a set .L to a 
set B, then fe fo gis a lincar map 7, from R? to R*. Show now that if ¥: Bo ¢, 
then 

Troe = T,9 Ty. 


(This should turn out to he a direct consequonee of the associativity of composition.) 
324 Let .1 be any set, and let g:.1 — .1 be such that yo g(a) = @ for every a. 
Then T,: fr fey is an involution on V = R4 (since Ty.y = Tye T,). Show that 
the decomposition of R® as the direet sum of the subspace of even functions and the 
subspace of old functions arises from an involution on R® defined by such a map 
¢g:R-R. 

5.25 Let V be a subspace of R*® consisting of differentiable functions, and suppose 
that V is invariant under differentiation (fE Y= DfET). Suppose also that on ¥ 
the lincar operator ) & Hom F satisfies D? — 2) — 3f = 0. Prove that V is the 
direct sum of two subspaces Wand 4 such that D = 37 on Mand D = —Z on N. 
Actually, it follows that Jf is the lincar span of a single vector, and similarly for NV. 
Find these two functions, if yourcan, (f/ = 3f=> f = ?) 


*Block decompositions of linear maps. Given 7 in Hom V and a direct sum 
decomposition V = @i V;, with corresponding projections {?;}7, we can 
consider the maps 7';; = P;° To P;, Although T;; is from V to V, we may also 
want to consider it as heing from lV’; to V; (in which case, strietly speaking, what 
is it?). We picture the 7';,’s arranged schematically in a rectangular array 


Similar to a matrix, as indicated below for 2 = 2. 


Tit | Tia 


Tor | Tee 


Furthermore, since 7 = 3°, ; 73;, we call the doubly indexed family the bloek 
decomposition of J associated with the given direct sum decomposition of V. 

More generally, if 7 © Hom(V, W) and WF also has a direct sum decomposi- 
tion W -— @@, W,, with corresponding projections {2}, then the family 
{T;;} defined by T;; = Q;° To P; and pictured as an m X 7 rectangular array 
is the block decomposition of T with respect to the two direct sum decompositions. 

Whenever 7 in Hom V has a special relationship to a particular direct sum 
decomposition of V, the corresponding block diagram may have features that. 
display these special propertics im a vivid way; this then helps us to understand 
the nature of 7 better and to caleulate with it more easily. 


66 VECTOR SPACES 1.3 


For example, if V = ¥, ® Vo, then Vy isinvariant under T (i.c., T[F i) c Vp 
if and only if the block diagram is upper triangular, as shown in the following 
diagram. 

Ti | Tig 


0 | Te. 


Suppose, next, that 7? = 0. Letting V', be the range of 7, and supposing that 
V, has a complement Vs, the reader should elearly see that the corresponding 
block diagram is R 

ro Ti2 


RS 
0. 
. 


0 


This form is called striefly upper triangular; it is upper triangular and also zero 
on the main diagonal. Conversely, if 7’ has some strietly upper-triangular 
2X 2 block diagram, then T? = 0. 

If # isa composition product, R = ST, then its block components can be 
computed in terms of those of S and 7. Thus 


Ra = P;RPy = PSTP, = PS (= P;) TP, — SST pe. 
j=l j=l 
We have used the identities / = S7., ?; and P; = P?. The 2X 2 case is 
pictured below. 


Sil ie + SieP ao 
SoiT 12+ Sook 22 


From this we ean read off a fact that will be useful to us later: If T is 2 x 2 
upper triangular (72; = 0), and if 7; is invertible as a map from FY; to 
V; G@ = I, 2), then T is invertible and its inverse is 


Suki: t+ S828 a1 
So:T11 + SoeFo1 


Ti! | —Tis 'Ti2P 20 
_ aa ee ee 
We find this solution by simply setting the product diagram equal to 
Fy, t+ 0 
oa 


and solving; but of course with the diagram in hand it ean simply be checked to 
be correct. 


EXERCISES 


5.26 Show that if 7 e Hom TV, if VF = Qj lu, and if {P37 are the corresponding 
projections, then the sum of the transformation Ti; = Pye To P; is T. 


1.6 BILINEARITY 67 


5.27 Hf S and f are in Hom V and {S,,}, {7} are their block components with 
respect, to some direct sum decomposition of V, show that S;;0 Tn = Oif 7 xl. 


5.28 Verify that if T has an upper-triangular block diagram with respect to the 
direct sum decomposition ¥ = Vy @ Vy», then Vj is invariant under T. 

5.29 Verify that if the diagram is strictly upper triangular, then T? = 0. 

5.30 Show that if ¥ = Vy; © V2 © V3 and T € Hom FP, then the subspaces V’; are 
all invariant under T if and only if the block diagram for T is 


Tir 0 0 
0 Pr 0 
0 0 T'33 


Show that T is invertible if and only if 74, is invertible (as an clement of Hom ¥;} 
for each 7. 

5.3L Supposing that T has an upper-triangular 2 2 block diagram and that 73; 
is invertible as an clement of Hom V; foré = 1, 2, verify that 7 is invertible by form- 
ing the 2X 2 block diagram that is the product of the diagram for T and the diagram 
given in the text as the inverse of 7. 

5.32 Supposing that 7 is as in the preceding exercise, show that S = T—! must have 
the given block diagram by considering the two equations To S = Iand So Tf = | 
in their block form. 

5.33 What would strictly upper triangular mean for a 3 & 3 block diagram? What 
is the corresponding property of 7? Show that 7 has this property if and only if tt has 
a strictly upper-triangular block diagram. (See Exercise 5,14.) 

5.34 Suppose that T in Hom FV satisfies 7” = 0 (but 7"! = 0). Show that 7 has 
a strictly upper-triangular n X nm block decomposition. (Apply Exercise 5.15.) 


6. BILINEARITY 


Bilinear mappings. The notion of a bilinear mapping is important to the un- 
derstanding of linear algebra because it is the vector setting for the duality 
principle (Section 0.10). 


Definition. If 0’, V, and W are vector spaces, then a mapping 
w: <&, > > wf, 2) 


from U} X V to W is bilinear if it is linear in each variable when the other 
variable ts held fixed. 


That is, if we hold é fixed, then 7 +> w(¢, 7) is linear [and so belongs to 
Hom(V, W)]; if we hold » fixed, then similarly w(é, 7) is in Hom(U, W) as a 
funetion of £. This is nef the same notion as linearity on the product vector 
space 1) x V. For example, <2z,y> > 2+ y is a linear mapping from R? 
to R, but it is not bilinear. If y is held fixed, then the mapping z+ z+ y is 
affine {translation through y}, but it is not linear unless y is 0. On the other 
hand, <a, y> +> zy isa bilinear mapping from R? to R, but it is not linear. If y 


68 VECTOR SPACES 1.6 


is held fixed, then the mapping x +> yz is linear. But the sum of two ordered 
couples does not map to the sum of their images: 


<x,y> + <u> = <x tuytor ro e+ wy t+), 


which is not the sum of the images, cy + uv. Similarly, the sealar product 
(x, y) = D7 xy: is bilinear from R” x R” to R, as we observed in Section 2. 

The linear meaning of bilinearity is partially explained in the following 
theorem. 


Theorem 6.1. If w:U' x VY — W is bilinear, then, by duality, is equiv- 
alent to a Hnear mapping from U to Hom(V, W) and also to a linear mapping 
from ¥ to Hom(U’, W). 


Proof. For each fixed » € V let w, be the mapping +> w(é, 7). That is, 
w,(€) = w(t”). Then wo, € Hom(U/, W) by the bilinear hypothesis. The 
mapping 7+? w, is thus from V to Hom(U, W), and its lincarity is due to the 
linearity of win » when & is held fixed: 


Wen par (£) = w(é, cH “f dg) = cw E, n) f- dw £, $) = cw, { £) “iF das¢(&), 


that 
sod Wentat = Cy + dw. 

Similarly, if we define w* by a*(n) = w(t, 9), then £ > o* isa lincar mapping 
from U to Hom(V, W). Conversely, if ¢: YU — Hom(V, W) is linear, then the 
function defined by w(t, 7) = ¢(£)() is bilinear. Moreover, wf = ¢(£), so 
that ¢ is the mapping +> #*. 0 


We shall sec that bilinearity occurs frequently. Sometimes the reinterprceta- 
tion provided by the above theorem provides new insights; at other times it 
scems less helpful. 

For example, the composition map <S, > ++ So T' is bilinear, and the 
corollary of Theorem 3.3, which in effcet states that composition on the right by 
a fixed T is a lincar map, is simply part of an explicit statement of the bilinearity. 
But the linear map 7 > composition by T is a complicated object that we have 
no need for except in the case W = R. 

On the other hand, the linear combination formula fT z;«; and Theorem 1.2 
do receive new illumination. 


Theorem 6.2. The mapping w(x, «) = 0} 2; 18 bilinear from R* x V¥* 
to V. The mapping «+> wa is therefore a linear mapping from V” to 
Hom({R*, V}, and, in fact, is an isomorphism. 


Proof. The linearity of w in x for a fixed a was proved in Theorem 1.2, and its 
linearity in @ for a fixed x is seen in the same way. Then & > we is linear by 
Theorem 6.1. Its bijectivity was implicit in Theorem 1.2. 0 


It should be remarked that we can use any finite index set / just as well as 
the special set # and conclude that w(x, &) = Dycz 2,0; is bilinear from R! x V? 


1.6 BILINEARITY 69 


to V and that «++ wa is an isomorphism from V! to Hom(R’, V). Also note 
that wa = La in the terminology of Section 1. 


Corollary. The scalar product (x, a) = >0j x;a; is bilinear from R* x R® 
to R; therefore, a> w, = L, is an isomorphism from R® to Hom(R”, R). 


Natural isomorphisms. We often find two vector spaces related to each other 
in such a way that a particular isomorphism between them is singled out. This 
phenomenon is hard to pin down in general terms but easy to describe by 
examples. 

Duality is one source of such “natural” isomorphisms. IT’or example, an 
m Xn matrix {t;;} is a real-valued function of the two variables <7,7>, and 
as such it is an element of the Cartesian space R”*~". We can also view {f;;} as 
a sequence of 2 column vectors in R”. This is the dual point of view where we 
hold j fixed and obtain a function of 7 for each 7. From this point of view {¢,;} 
is an element of (R™)”. This correspondence between R™*? and (R”)* is clearly 
an isomorphism, and is an example of a natural isomorphism. 

We review next the various ways of looking at Cartesian n-space itself. 
One standard way of defining an ordered n-tuplet is by induction. The ordered 
triplet <x, y, z2> is defined as the ordered pair < <x, y>,z>, and the ordered 
n-tuplet <2,,...,%,> is defined as <<2),...,%,-1>,tn>-. Thus we 
define R” inductively by setting R' = R and R® = R*~! x R. 

The ordered 7-tuplet can also be defined as the function on %7 = {1,..., 2} 
which assigns z; to z. Then 


42); 3 60) 8ar = {<b fy. sy SR tar}; 


and Cartesian n-space is R? = RUr*, 


Finally, we often wish to view Catal {n + m)-space as the Cartesian 
product of Cartesian x-space with Cartesian m-space, so we now take 


M21) 22+ Snim* as <<21,---;2n>; RSngiye ess Latme > 


and R®*™ as R™ K R™. 

Here again if we pair two different models for the same 7-tuplet, we have an 
obvious natural isomorphism between the corresponding models for Cartesian 
n-spacc. 

Finally, the characteristic properties of Cartesian product spaces given in 
Theorems 3.6 and 3.7 yield natural isomorphisms. Theorem 3.6 says that an 
n-tuple of lincar maps {7;}7 on a common domain V is equivalent to a single 
n-tuple-valued map 7, where T(t) = <T)(t),...,7n(4)> for all €e V. 
(This is duality again! 7';(£) is a funetion of the two variables 7 and £.) And it is 
not hard to sec that this identification of 7 with {7',}7 is an isomorphism from 
J], Hom{V, W’,;) to Hom(¥, TJ; W,). 

Similarly, Theorem 3.7 identifies an n-tuple of lmear maps {7';}7 into a com- 
inon codomain V with a single lincar map T of an n-tuple variable, and this iden- 
tification is a natural isomorphism from [Jj Hom(W;, V) to Hom([[7 W,, V). 


70 VECTOR SPACES 1.6 


An arbilrary isomorphism between two vector spaces identifies them in a 
transient way. For the moment we think of the vector spaces as representing the 
same abstract space, but only so long as the isomorphism is before us. If we 
shift to a different isomorphism between them, we obtain a new temporary 
identification. Natural isomorphisms, on the other hand, effect permanent 
identifications, and we think of paired objects as being two aspects of the same 
object in a deeper sense. Thus we think of a matrix as “being” either a sequence 
of row vectors, a sequence of column vectors, or 2 single function of two integer 
indices. We shall take a final look at this question at the end of Section 3 in the 
next chapter. 


*We can now make the ultimate dissection of the theorems centering 
around the linear combination formula. Laws $1 through 83 state exactly that 
the scalar product xa is bilinear. More precisely, they state that the mapping 
S: <2,a> 4 sa from R X W to W is bilinear. In the language of Theorem 6.1, 
Xa = We(x}, and from that theorem we conclude that the mapping a +> wy, is 
an isomorphism from W to Hom(R, IW). 

This isomorphism between W and Hom(R, W) extends to an isomor- 
phism from W" to (Hom(R, W))", which in turn is naturally isomorphic to 
Hom(R*,WW) by the second Cartesian product isomorphism. Thus W” is natu- 
rally isomorphic to Hom(R*, W); the mapping is a+ La, where La(x} = 
pan Diy. 

In particular, R® is naturally isomorphic to the space Hom(R”, R) of all 
linear functionals on R”, the x-tuple a corresponding to the functional w, 
defined by w(x) = 277 ait, 

Also, (R”)" is naturally isomorphic to Hom(R", R™). And sinee R™** is 
naturally isomorphic to (R™)", it follows that the spaces R®*” and Hom(R*, R®) 
are naturally isomorphic. This is simply our natural association of a transfor- 
mation 7 in Hom(R*, R™) to an m 2 matrix {4;;}- 


CHAPTER 2 


FINITE-DIMENSIONAL VECTOR SPACES 


We have defined a vector space to be finite-dimensional if it has a finite spanning 
set. In this chapter we shall foeus our attention on such spaces, although this 
restriction is unnecessary for some of our discussion. We shall see that we can 
assign ta each finite-dimensional space V a unique integer, called the dimension 
of V, which satisfies our intuitive requirements about dimensionality and which 
becomes a principal tool im the deeper explorations into the nature of such 
spaces. A number of “dimensional identities” are crucial in these further 
investigations. We shall find that the dual space of all linear functionals on V, 
V* = Hom(V, R}), plays a more satisfactory role in finite-dimensional theory 
than in the context of general vector spaces. (However, we shall see later in 
the book that when we add limit theory to our algebra, there are certain special 
infinite-limensional vector spaces for which the dual space plays an equally 
important role.) A finite-dimensional space can be characterized as a vector 
space isomorphic to some Cartesian space R”, and such an isomorphism allows a 
transformation 7 in Hom V to be “transferred” to R”, whereupon it acquires a 
inatrix. The theory of linear transformations on such spaces is therefore mirrored 
completely by the theory of matrices. In this chapter we shall push much 
deeper into the nature of this relationship than we did in Chapter I. We also 
include a section on matrix computations, a brief section describing the trace 
and determinant functions, and a short discussion of the diagonalization of a 
quadratic form, 


t. BASES 
Consider again a fixed finite indexed set of vectors a = {a;:7¢ J} in V and the 


corresponding linear combination map La:x > > xe; from R? to V having « 
as skeleton. 


Definition. The finite indexed set {a;:7€ Z} is independent if the above 
mapping La is injective, and {a,} is a basis for V if Za is an isomorphism 
{onto V). In this situation we call {a;: 7 € Z} an ordered basis or frame if 
f=%= {l,...,n} for some positive integer 7. 


Thus {a; :7 € J} isa basis if and only if for each £ € V there exists a unique 
indexed “coefficient” seb x = fa;:¢E 7. © R’ such that = Vira. The 
71 


72 PINITE-DIMENSIONAL VECTOR SPACES 2.1 


numbers 2; always exist because {a;:7€ J} spans V, and x is unique because 
Le is injective. 

For example, we can check directly thatbh' = <2,1> andb? = <1, —3> 
form a basis for R?. The problem is to show that for each y € RR? there is a 
unique x such that 


2 . 
y— >) ab! = 2, <2,1> +42<1, -8> = <2x; + 22,2; — 3re>. 
t 


Since this vector equation is equivalent to the two scalar equations y; = 
20, +22 and ys = x1 — 3x2, we can find the unique solution 2; = (3y, + ¥2)/7, 
22 = (y1 — 2y¥2)/7 by the usual elimination method of secondary school 
algebra. 

The form of these definitions is dictated by our interpretation of the linear 
combination formula as a lincar mapping. The more usual definition of indepen- 
denee is a corollary. 


Lemma 1.1. The independence of the finite indexed set {a;:iE J} is 
equivalent to the property that }°; z;a@; = 0 only if all the coefficients xz; 
are 0. 


Proof. This is the property that the null space of La consist only of 0, and is 
thus equivalent to the inJectivity of La, that is, to the independence of {a,}, by 
Lemma I.] of Chapter 1. 0 


If {a,}7 is an ordered basis (frame) for V, the unique v-tuple x such that 
¢ = D7 xy; is called the coordinate n-tuple of & (with respect to the basis {c;}), 
and 2; is the ¢th coordinate of & We call x;a; (and sometimes x;)} the ith component 
of ¢ The mapping Za will be called a basis isomorphism, and its inverse La’, 
which assigns to cach vector & € V its unique coordinate z-tuple x, is a coordinate 
isomorphism. The linear functional +> 2; is the jth coordinate functional; 
it is the eomposition of the coordinate isomorphism & +> x with the jth coordi- 
nate projection x + z; on R®. We shall see in Section 3 that the 2 coordinate 
functionals form a basis for V* = Hom(V, R). 

In the above paragraph we took the index set J to be % = {1,...,} and 
used the language of #-tuples. The only difference for an arbitrary finite index 
set is that we speak of a coordinate funetion x = {«;: 7 € Z} instead of a coordi- 
nate #-tuple. 

Our first eoneern will be to show that every finite-dimensional (finitely 
spanned) vector space has a basis. We start with some remarks about indices. 

We note first that a finite indexed set {a;:7€ J} can be independent only 
if the indexing is injective as a mapping into V, for if a, = aj, then za; = 0, 
where z, = 1, 2: = —1, and x; = 0 for the remaining indices. Also, if {a;:7€ D} 
is independent and J Cc J, then {a;:7 € J} is independent, since if Dy xa; = 0, 
and if we set 2; = 0 for ie I — J, then }0; xa; = 0, and so cach x; is 0. 
A finite unindexed set is said to be independent if it is independent in some 


2.1 BASES 73 


(necessarily bijective) indexing. It will of course then be independent with 
respect to any bijective indexing. An arbitrary set. is independent if every finite 
subset is independent. It follows that a set A is dependent (not independent) if 
and only if there exist distiret clements a,,...,@, in A and sealars 2), ...,2%n 
not all zero sueh that 337 vj; = 0. An unindexed basis would be defined in the 
obvious way. However, a set can always be regarded as being indexed, by itself 
if necessary! 


Lenima 1.2. 1f #2 is an independent subset of a vector space V and @ is any 
vector not in the linear span £3), then B U {8}. is independent. 


Proof. Oiherwise there is a zero lincar eombination, 26 -|- 07 2,8; = 0, where 
81,-.., 8, are distinet clements of 8 and the coefficients are not all Q. Bui then 
« cannot be zero: if it were, the equation would contradiet the independence of 
B. We can therefore divide by x and solve for 8, so that @ € L{B)}, a contra- 
diction. 0 


The reader will remember that we call a veetor space V finite-dimensional 
if it has a finite spanning set. {a;;}. We can use the above Iemma to construct a 
basis for such a V’ by choosing some of the a,’s. We simply run through the 
sequence {a;; and choose those members that inercase the linear span of the 
preecding choices. We end up with a spanning set sinee {e,;}] spans, and our 
subsequence is independent at each step, by the lemma. In the same way we 
ean extend an independent set {8;}{ to a basis by choosing some members of a 
spanning set {a}. This procedure is intuitive, but it is messy to set up rigor- 
ously. We shall therefore proceed differently. 


Theorem 1.1. Any minimal finite spanning set is a basis, and therefore any 
fiuite-dimensional vector space V’ has a basis. More generally, if {8; : 7 € J} 
is a finite independent set and {e;: 7 & Z} is a finite spanning set, and if A 
is a smallest subset of / such that (8;}7 U {ec} « spans, then this collection is 
independent and a hasis. Therefore, any finite independent subset of a 
finite-dimensional space can be extended to a basis. 


Proof. It is sufficicut to prove the second assertion, since it includes the first 
as a special case. If {8;! 7 U {a;} « is not independent, then there is a nontrivial 
zero linear combination 3°y ¥;8; + De tie; = 0. Tf every x; were zero, this 
equation would contradict the independence of {8;}47. Therefore, some z, is 
not zero, and we can solve the equation for a. That is, ifweset LZ = K — {kt}, 
ihen the lincar span of {8;} y U {o,} , contains ay. It therefore includes the whole 
original spanning set and henee is V. But. this contradicts the minimal nature of 
K, sinee Lisa proper subset of K. Consequently, {8;} ¢ U {e;} x is independent. 0 


We next note that R” itself has a very special] basis. In the indexing map 
ita; the vector a; corresponds to the index j, but under the linear combi- 
nation map x > } xa, the vector a; corresponds to the funetion 8’ which has 
the value 1 at j and the value 0 elsewhere, so that 3; da; = a;. This funetion 


74 FINITE-DIMENSIONAL VECTOR SPACES 2.1 


é/ is called a Kronecker delta function. It is clearly the characteristic function Xz 
of the one-point set B = {j}, and the symbol ‘é” is ambiguous, just as ‘Xp’ 
is ambiguous; in each case the meaning depends on what domain is implicit from 
the context. We have already used the delta functions on R* in proving Theorem 
1.2 of Chapter 1. 

147 


Theorem 1.2, The Kronecker functions {6’}7_, form a basis for R”. 


Proof. Since 2 x;8*(7) = x; by the definition of 3*, we see that Vi aid is 
the n-tuple x itself, so the linear combination mapping Lg: x > 307 2° is the 
identity mapping x +> x, a trivial isomorphism. U 


Among all possible indexed bases for R”, the Kronecker basis is thus singled 
out by the fact that its basis isomorphism is the identity; for this reason it is 
called the standard basis or the natural basis for R”. The same holds for R? for 
any finite set f. 

Finally, we shall draw some elementary conclusions from the existence of 
a basis. 


Theorem 1.3. If T € Hom(V, W) is an isomorphism and a = {a;:7 € I} 
is a basis for V, then {T(a,) : 7 € J} is a basis for W. 


Proof. By hypothesis La is an isomorphism in Hom(R”, V), and so To La is 
an isomorphism in Hom(R”, W). Its skeleton {T(a;)}} is therefore a basis for W. 0 


We can view any basis {a;} as the image of the standard basis {6°} under 
the basis isomorphism. Conversely, any isomorphism @: RR! — B becomes a basis 
isomorphism for the basis a; = @(6”). 


Theorem 1.4. If X and Y are complementary subspaces of a vector space V, 
then the union of a basis for X and a basis for Y isa basis for V. Conversely, 
if a basis for V is partitioned into two sets, with linear spans X and Y, 
respectively, then X and Y are complementary subspaces of V. ’ 


Proof. We prove only the first statement. If fa;:7€ J} is a basis for X and 
{a;:2€ K} is a basis for Y, then it is clear that {a;:7¢ J U K} spans V, since 
its span includes both X and Y, and so X + Y = V. Suppose then that 
Deux vie; = 0. Setting {= Ly ca; and y= Dox 2:03, we see that Ee X, 
n¢Y,andé++=0. Butthen £ = » = 0,since X and Y are complementary. 
And then «; = 0 for 2 € J because {a,}y is independent, and x, = 0 fori € K 
because {a,} x is independent. Therefore, {a;} yUx isa basis for V. We leave the 
converse argument as an exercise. 0 


Corollary. If V = @j V; and B; is a basis for V;, then B = Ui B; is a 
basis for V. 
Proof. We see from the theorem that B, U B, isa basis for V; @ Ve. Proceed- 
ing inductively we see that Ui, B; is a basis for Gy, V; for 7 = 2,...,2, 
and the corollary is the case 7 = n. 0 


2.1 BASES 75 


If we follow a coordinate isomorphism by a linear combination map, we get 
the mapping of the following existence theorem, which we state only in 7-tuple 
form, 


Theorem 1.5, If 8 == {8,}47 is an ordered basis for the vector space V, and 
if {a;}7 is any »-tuple of vectors in a vector space W, then there exists a 
unique S € Hom(Y, W) such that S(6;) = o; for? = 1,..., 7. 


Proof. By hypothesis Lg is an isomorphism in Hom(R”, V}, and so 
S = Lac (Lg)! is an element of Hom(V, W) such that S(8,) = La(5') = aj. 
Conversely, if S € Hom(V, W) is such that S(8;) = a; forall, then $ o Lp(a} = 
a; forall 7, sothat S o Lg — Le. Thus Sis uniquely determined as Lao (Lg)—'. 0 


It is natural 10 ask how the unique S above varies with the #-tuple {e;}. 
The answer is: linearly and ‘“‘isomorphieally”’. 


Theorem 1.6. Let {@;}{ be a fixed ordered basis for the vector space 
V, and for each n-tuple a = {a;}7 chosen from the vector space W let 
Sa € Hom(V, W) be the unique transformation defined above. Then the 
map w+ Sq is an isomorphism from W” to Hom(¥, W). 


Proof. As above, Sa = La ° @—!, where @ is the basis isomorphism Lg. Now we 
know from Theorem 6.2 of Chapter 1 that a ++ Ze is an isomorphism from W" 
to Hom(R®, W), and composition on the right by the fixed coordinate isomor- 
phism 97! js an isomorphism from Hom(R*, W) to Hom(¥, W’) by the corollary 
to Theorem 3.3 of Chapter 1. Composing these two isomorphisms gives us the 
theorem. E 


*Infinite bases. Most vector spaces do not have finite bases, and it is natural to 
try to extend the above discussion to index sets 7 that may be infinite. The 
Kronecker functions {8': 7 € Z} have the same definitions, but they no longer 
span R’. By definition f is a linear combination of the functions 6* if and only 
if f is of the form Yies, c,d’, where I, is a finite subset of Z. But then f = 0 
outside of Z,. Conversely, if fe R! is 0 except on a finite set Zi, then f = 
Lies, {(@) 6. The linear span of {8° : 7 € 7} is thus exactly the set of all func- 
tions of R? that are zero except on a finite set. We shall designate this sub- 
space R;. 

If {a;:7€ Z} is an indexed set of vectors in V and f € R;, then the sum 
Dies S(t)a; becomes meaningful if we adopt the reasonable convention that the 
sum of an arbitrary number of 0’s is 0. Then Der = Diez, where Ig is any 
finite subset of J outside of which f is zero. 

With this convention, La: f+ 3°; f(z}q; is a linear map from Ry to V, as in 
Theorem 1.2 of Chapter 1. And with the same convention, >) ;<; f(z}a; is an 
elegant expression for the general linear combination of the vectors e,. Instead 
of choosing a finite subset J, and numbers ¢; for just those indices 7 in 7;, we 
define ¢; for all 7 € J, but with the stipulation that c; = 0 for all but a finite 
number of indices, That is, we take ec = f{e;: 7 € 7} asa function in R;. 


76 FINITE-DIMENSIONAL YECTOR SPACES 2.1 


We make the same @efinitions of trdependence and basis as before. Then 
fa;:7€ DP} is a basis for V if and only if La: Ry — V is an isomorphism, i.e., if 
and only if for each € € V there exists a unique x € R; such that § = >; x,a;. 

By using an axiom of set theory called the axiom of choice, it can be shown 
that every vector space bas a basis in this sense and that any independent sct 
can be extended to a basis. Then Theorems 1.4 and 1.5 hold with only minor 
changes in notation. In particular, if a basis for a subspace Af of V is extended 
to a basis for V, then the linear span of the added part is a subspace N comple- 
mentary to Af. Thus, in a purely algebraic sense, every subspace has com- 
plementary subspaces. We assume this fact in some of our exercises. 

The above sums are always finite (despite appearanecs), and the above 
notion of basis is purely algebraic. However, infinite bases in this sense are not 
very useful in analysis, and we shall therefore concentrate for the present on 
spaces that have finite bases (i.e., are finite-dimensional). Then in one impor- 
tant context later on we shall discuss infinite bases where the sums are genuinely 
infinite by virtue of limit theory. 


EXERCISES 


1.1 Show by a direct computation that {<1, —1>, <0, 1>} is a basis for R®. 

1.2 ‘The student must realize that the ith coordinate of a vector depends on the whole 
basis and not just on the 7th basis vector. Prove this for the second coordinate of 
vectors in R? using the standard basis and the basis of the above exercise. 

1.3 Show that {<1,1>, <1,2>} is a hasis for Y = R®. The basis isomorphism 
from R? to ¥ is now from R2 to R*. Find its matrix. Find the matrix of the coordinate 
isomorphism. Conipute the coordinates, with respect to this basis, of < —1,1>, <0,1>, 
<2,3>. 

1.4 Show that {b%?, where b! = <1,0,0>, b? = «1,1,0>, and b? = 
<1,1,1>, isa basis for R3, 

1.5 In the above exercise find the threc linear functionals ?; that are the coordinate 
functionals with respect to the given basis. Since 


3 - 
x = U(x)b’, 
1 


finding the l; is equivalent to solving x = >[?yib* for the y,’s in terms of 
X= <21,2X2,%3>,. 

1.6 Show that any set of polynomials no two of which have the same degree is 
independent. 

1.7 Show that if {a;}7 is an independent subset of V and Tin Hom(V’, W) is injec- 
tive, then {7"(a,)} 7 is an independent subset of W’. 

1.8 Show that if Tis any clement of Hom(V, W’) and {7(@,)} 7 is independent in W, 
then {a,}7 is independent in VY. 


2.2 DIMENSION 7 


1.9 Later on we are going to call a vector space Fo x-dimensional if every basis for 
V contains exactly 2 clements. If Vis the span of a single vevlor a, so that V = Re, 
then V is clearly one-dimensional. 

Let {Vif be a collection of one-dimensional subspaces of a vector space V, and 


choose a nonzero vector a; in V; fer cach a. Prove that. {a,}7 is independent if and 


only if the subspaces {¥y17 are independent and that {ei!7 is a basis if and only if 
v= OFF. 

1.10 Finish the proof of Theorem 1.4. 

1.11 Give a proof of Theorem 1.4 based on the existence of isomorphisms. 

1.12 The reader would guess, and we shall prove in the next section, (hat every 
subspace of a finite-limensional space is finite-dimensional, Prove now that a sub- 
space V of a finile-dimensional vector space Vis finiteaimensional if and only if it has 
acomplement J. (Work froin a combination of Theorems 1.1 and 1.4 and direet sum 
projections.) 

1.13 Since {pir4 = {<1,0,0>, <1,1,0>, <1, 1,1 >} ts 4 basis for R4, there is a 
unique 7 in Hom(R?, R2) such that fib = <1,0>, Tib*) = <0,1>, and 
T(b?) = <1, 1>. Find the matrix of 7. (Find 78) fori = 1, 2, 3.) 

1.14 Find, similarly, the Sin Hom R? such that Sb‘) = 6 fort = 1, 2, 3. 

1.15 Show that the infinite sequence {09 is a basis for the vector space of all poly- 
nomials, 


2. DIMENSION 


The concept of dimension rests on the fact that two different bases for the same 
space always contain the same number of elements. This number, which is 
then the number of elements in every basis for ¥, is called the dimension of V. 
It tells all there is to know about V to within isomorphism: There exists an 
isomorphism between two spaces if and only if they have the same dimension. 
We shall consider only finite dimensions. If V is not finite-dimensional, its 
dimension is an infinite cardinal number, a concept with which the reader is 
probably unfamiliar. 


Lemma 2,1. If V’ is finite-dimensional and 7' in Hom V’ is surjective, then 7 
is an isomorphism. 


Proof. Let» be the smallest number of elements that. can span V. That is, there 
is some spanning set fa;}7 and none with fewer than n clements. Then {a;}7 isa 
basis, by Theorem 1.1, and the linear combination map 8:x > 304 z;a; is 
accordingly a basis isomorphism. But. {6;}] = {7fa,)}7 also spans, sinee T is 
surjective, and so 7’ o @ is also a basis isomorphism, for the same reason. Then 
T = (To @) 0 @~! isan isomorphism. 0 


Theorem 2.1, If V is finite-dimensional, then all buses for V contain the 
same number of elements. 


Proof. Two bases with » and m clements determine basis isomorphisms 
6: R" — Vandy: R” — ¥. Suppose that m < » and, viewing R" as R” x R"—™, 


78 FINITE-DIMENSIONAL VECTOR SPACES 2.2 


let 7 be the projection of R" onto R”, 
WK 21,62. Lmy ey Bum) = KX. y Mmm, 


Since T = 6—' © y is an isomorphism from R” to R" and T° w: R® > R® is 
therefore surjective, it follows from the lemma that Te w is an isomorphism. 
Then # = T~'o (Pez) is an isomorphism. But it isn’t, because 7(6") = 0, 
and we have a contradiction. Therefore no basis can be smaller than any other 
basis. O 


The integer that is the nwnber of elements in every basis for V is of course 
called the dimension of V, and we designate it ¢(V). Since the standard basis 
{6'}7 for R* has x elements, we see that R” is x-dimensional in this precise sense. 


Corollary. Two finite-dimensional vector spaces are isomorphic if and only 
if they have the same dimension. 


Proof. If T isan isomorphism from V to W and B is a basis for V, then 7[B] is a 
basis for W by Theorem 1.3. Therefore d(V) = #8 = #T[B] = d(W), where 
#A is the number of elements in A. Conversely, if ¢(V) = d0V) = n, then V 
and W’ are each isomorphie to R” and so to each other. 0 


Theurem 2.2. Every subspace Af of a finite-dimensional vector space V is 
finite-dimensional. 


Proof. Let @ be the family of finite independent subsets of Jf. By Theorem 1.1, 
if A € @, then A can be extended to a basis for V, and so #A < d(V). Thus 
fgA:A EG} is a finite set of integers, and we can choose B € @ such that 
n = #B is the maximum of this finite set. But then Z(8) = AY, because other- 
wise for any a € AY — L(B) we have BU {a} € @, by Lemma 1.2, and 


#(BU fay) = n+, 
contradicting the maximal nature of n. Thus Af is finitely spanned. 0 


Corollary. Every subspace Ai of a finite-dimensional space V has a comple- 
ment. 


Proof. Use Theorem 1.1 to extend a basis for AZ to a basis for V, and let N be 
the linear span of the added vectors. Then apply Theorem 1.4. 0 


Dimensional identities. We now prove two basic dimensional identities. 
We will always assume V finite-dimensional. 


Lemma 2.2. If V1 and V2 are complementary subspaces of V, then d(V) = 
d(V,}+d(V2). More generally, if V = QI V; then d(V) = Di d(V;. 


Pyoof. This follows at onee from Theorem 1.4 and its corollary. 0 


Theorem 2.3. If U and W are subspaces of a finite-dimensional vector space, 
then d(U + W) + d(Un W) = d(€) + dW). 


2.2 DIMENSION 79 


Proof. Let V bea complement of U N Win U/. Westart by showing that then V 
isalso a complement of W in + W. First 


V+W=V+(UAW)4 HW) = (V+ (UNW)) + WHULW,. 


We have used the obvious fact that the sum of a vector space and a subspace 
is the vector space. Next, 


VoW=(VnvuynW=VaWUnw)= 0, 


beeause V is a complement of UMW in U. We thus have both ? 4+ W= 
U+W and VN W = {0}, and so V is a complement of W in U + W by 
the corollary of Lemma 5.2 of Chapter 1. 

The theorem is now a corollary of the above lemma. We have 


a(U) + d(W) = (dU 9 W) + a(V)) + dW) = AU mn W) + (a(V) + a(W)) 
=dUnW)+au+W). O 


Theorem 2.4. Let V be finite-dimensional, and Jet W be any vector space. 
Let T € Hom(V, W) have null space ¥ (in VY) and range & (in W). Then & 
is finite-dimensional and d(V) = d(N} + d(R). 


Proof. Let U be a complement of NW in V. Then we know that T [ U is an 

isomorphism onto R. (See Theorem 5.3 of Chapter 1.) Therefore, FR is finite- 

dimensional and d(R) — d(N) = d(U) + d(N) = a(V) by our first identity. J 
Corollary. If W is finite-dimensional and d(W) = d(V), then 7 is injective 
if and only if it is surjective, so that in this case injectivity, surjectivity, and 
bijectivity are all equivalent. 

Proof. T is surjective if and only if R = W. But this is equivalent to d(R) = 

d(W), and if d(W) = a(¥), then, the theorem shows this is turn to be equivalent 

to d(N) = 0, that is, to N = {0}. 0 


Theorem 2.5, If d(V¥) =n and d(W} = m, then Hom(V, W) is finite- 
dimensional and its dimension is mn. 


Proof. By Theorem 1.6, Hom(V, W) is isomorphic to W” which is the direct 
sum of the n subspaces isomorphic to W under the injections @; for? = 1,..., 7. 
The dimension of W® is therefore [jm = mn by Lemma 2.2. 0 


Another proof of Theorem 2.5 will be available in Section 4. 


EXERCISES 


2.1 Prove that if d(V) 
2.2 Prove that if-d(V) = n, then any independent subset of » elements is a basis. 
2.8 Show thatifd(V) = n and IW is a subspace of the same dimension, then W = V. 


n, then any spanning subset of x elements is a basis. 


80 FINITE-DIMENSIONAL VECTOR SPACES 22 


2.4 Prove by using dimensional identities that if f is a nonzero linear functional on 
an #-dimensional space V, then its null space has dimension n — 1. 

2.5 Prove by using dimensional identities that. if f is a linear functional on a finite- 
dimensional space V’, and if & is a vector not in its null space NV, then V = N @ Ra. 

2.6 Given that NV jis an (x — 1)-climensional subspace of an n-dimensional vector 
space F, show that ¥ is the null space of a linear functional. 

2.7 Let VN and Y be subspaces of a finite-dimensional vector space V, and suppose 
that fT in Hom(V, W) has null space V = VO ¥, Show that F(X + ¥} = TIX] @ 
T(Y¥), and then deduce Theorem 2.3 from Lemma 2.2 and Theorem 2.4. This proof 
still depends on the existence of a T having WN = NV ¥ as its null space. Do we know 
of any such T? 

2.8 Show that if V is finite-dimensional and 8, 7 & Hom ¥, then 


Sof = f => Tis invertible. 


Show also that ToS = J => T is invertible. 

2.9 A subspace N of a veetor space V has fintte codimension n if the quotient space 
V/N is finite-dimensional, with dimension xn. Show that a subspace N has finite 
eodimension » if and only it has a complementary subspace Jf of dimension x. 
(Move a basis for V/N back into V.) De not assume V ww be finite-dimensignal, 

2.10 Show that if Ny and Ng are subspaces of a vector space VF with finite eodimen- 
sions, then Vv = A’; O Ny has finite codimension and 


cod(N) < cad(N 1) + eod (9). 


(Consider the mapping {> <£,, fo> when £; is the coset of V; containing £.) 

2.11 In the above excreise, suppose that cod(¥1) = cod(N2), that is, d(V/Ny) = 
d{V/No2). Prove that d{N /N) = d{(N2/N). 

2.12 Given nonzero vectors 8 in V and fin ¥* such that f(8) ¥ 0, show that some 
scalar multiple of the mapping £++ f(£)8 is a projection. Prove that any projection 
having a onelimensional range arises in this way. 

2.13 We know that the choice of an origin O in Euclidean _3-space E* induces a 
vector space structure in E* (under the correspondence X ++ OX) and that this vector 
space is three-climensional. Show that a geometric plane through 0 becomes a two- 
dimensional subspace. 

2.14 An m-dimensional plane. isa translate V + ao of an m-dimensional subspace V, 
Let {8,;)} be any basis of NV, and set a; = 8;-+ ao. Show that Af is exactly the set of 
linear combinations 


™ bed 
x 2 Ok such that = v=. 
i=0 0 


2.15 Show that Exercise 2.14 is a corollary of Exercise 4.14 of Chapter 1. 
2.16 Show, cunversely, that if a plane wf is the affine span of m+ 1 elements, then 
its dimension is < m. 


2.17 From the above two exercises concoct a direct definition of the dimension of an 
affine subspace, 


2.3 THE DUAL SPACE 81 


2.18 Write a small essay suggested by the following definition. An (m-+ 1)-tuple 
{a;}> is afinely independent if the conditions 


& 20; = 0 and Sx: = 0 
0 


0 
together imply that 
r= 0 for all 7. 


2.19 A polynomial on a vector space V is a real-valued function on ¥ which can be 
represented as a finite sum of finite products of linear functionals. Define the degree 
of a polynomial; define a homogeneous polynomial of degree k. Show that the set of 
homogeneous polynomials of degree & is a vector space X;. 

2.20 Continuing the above exercise, show that if ki < ko < +++ < ky, then the 
vector spaces {X,,) are independent subspaces of the vector space of all polynomials. 
[Assume that a polynomial p(t) of areal variable can be the zero function only if all 
its eoefhcients are 0. For any polynomial P on V consider the polynomials pat) = 
P(te).] 

2.21 Let <a, > be a basis for the two-dimensional space V, and let <A, u> be the 
corresponding coordinate projections (dual basis in V*). Show that every polynomial 
on ¥ “is a polynomial in the two variables \ and 2”. 

2.22 Let <a,8> be a basis for a two-dimensional vector space V, and let <A, x > 
be the corresponding coordinate projections (dual basis for V*). Show that 


<2, du, uw? > 


is a hasis for the vector space of homogeneous polynomials on V of degree 2. Similarly, 
compute the dimension of the space of homogeneous polynomials of degree 3 on a 
two-dimensional vector space. 

2.23 Let V and W be two-dimensional vector spaces, and let F be a mapping from 
¥ to W. Using coordinate systems, define the notion of F being quadratic and then 
show that it is independent of coordinate systems. Generalize the above exercise to 
higher dimensions and also to higher degrees. 

2.24 Now let F: ¥ — W be «a mapping between two-dimensional] spaces such thut 
for any u, v€ V and any /& W*, U(F(tu-+ v)) isa quadratic function of ¢, that is, of 
the form of? + f+ ¢. Show that F is quadratic according to your definition in the 
above exercises. 


3. THE DUAL SPACE 


Although throughout this section all spaces will be assumed finite-dimensional, 
many of the definitions and properties are valid for infinite-dimensional spaces 
as well. But for such spaces there is a difference between purely algebraic 
situations and situations in which algebra is mixed with hypotheses of continuity. 
One of the blessings of finite dimensionality is the absence of this complication. 
As the reader has probably surmised from the number of special linear functionals 
we have met, particularly the coordinate functionals, the space Hom(V, R) 
of all linear functionals on V plays a special role. 


82 FINITE-DIMENSIONAL VECTOR SPACES 2.3 


Definition. The dual space (or conjugate space) V* of the vector space V is 
the vector space Hom{V, R) of all linear mappings from V to R. Its elements 
are called linear functionals. 


We are going to sce that in a certain sense V is in turn the dual space of 
V* (V and (V*)* are naturally isomorphic), so that the two spaces are sym- 
metrically related. We shall briefly study the notion of annzhilation (orthogoual- 
ity) which has its origins in this setting, and then see that there is a natural 
isomorphism between Hom(V, W) and Hom(W*, V*). This gives the mathema- 
tician a new tool to use in studying a linear transformation 7 in Hom(V, W); 
the relationship between T and its image T* exposes new properties of 7 itself. 


Dual bases. At the outsct one naturally wonders how big a space V* is, and we 
settle the question immediately. 


Theorem 3.1. Let {8;}7 be an ordered basis for V, and let &; be the corre- 
sponding jth coordinate functional on V: &;{£) = 2;, where £ = Sf 2,8;. 
Then {8} is an ordered basis for ¥*. 


Proof. J.ect us first make the proof by a direct elementary calculation. 

a) Independence. Suppose that 3°? cj; = 0, that is, 2°} ¢6,;(€) = 0 for 
all cc V. Taking & = 8; and remembering that the coordinate n-tuple of 6; 
is 6’, we see that the above equation reduces to ¢; = 0, and this for all ¢. There- 
fore, {8;} 7 1s independent. 

b) Spanning. First. note that. the basis expansion § = } 2,8; can he re- 
written = > &(58;. Then for any \&V* we have A(t) = 37 48,8), 
where we have set 1; = A(8;). That is, \ = 3° 1,8;. This shows that {8,}] spans 
V*, and, together with (a), that, it. is a basis. 0 


Definition. The basis {&;} for V* is called the dual of the basis {8,;} for V. 


As usual, one of our fundamental isomorphisms is lurking behind all this, 
but we shall leave its exposure to an exercise. 


Corollary. d(V*) = d{V). 


The three equations 
&é= PY EEA, A= LP AMB)E:; A(E) = LY AGBs) - &i( £} 


are worth locking at. The first two are symmetrically related, each presenting 
the basis expansion of a vector with its coeficients computed by applying the 
corresponding element of the dual basis to the veetor. The third is symmetric 
itself between & and i. 

Since a finite-dimensional space V and its dual space V* have the same 
dimension, they are of course isomorphic. In fact, each basis for V defines an 
isomorphism, for we have the associated coordinate isomorphism from VY to R’, 
the dual basis isomorphism from R” to V*, and therefore the composite isomor- 


2.3 THE DUAL SPACE 83 


phism from V to Y*. This isomorphism varies with the basis, however, and there 
is in general no natural isomorphism between V and V*. 

It is another matter with Cartesian space R” because it has a standard 
basis, and therefore a standard isomorphism with its dual space (R")*. It is 
not hard to see that this is the isomorphism at» L,, where L,(x) = D7 ari, 
that we discussed in Section 1.6. We can therefore feel free to identify R” with 
(R")*, only keeping in mind that when we think of an #-tuple a as a linear 
functional, we mean the functional £,(x) = Dt esi. 


The second conjugate space. Despite the fact that V and V* are not naturally 
isomorphic in general, we shall now see that V zs naturally isomorphie to V** = 
(¥*)*. 


Theorem 3.2. The funetion #: ¥ x V*-—R defined by w(t, f) = f(£) is 
bilinear, and the mapping {+> w from V to V** is a natural isomorphism. 


Proof. In this context we generally set &** = w*, so that &** is defined by 
£**(f) = f(t) forall fe V*. The bilinearity of w should be clear, and Theorem 
6.1 of Chapter 1 therefore applies. The reader might like to run through a 
direct check of the linearity of EH &** starting with (¢,&) — cz&)**(f). 

There still is the question of the injectivity of this mapping. If a ~ 0, we 
can find f € V* so that f(«) ~ 0. One way is to make « the first vector of an 
ordered basis and to take f as the first functional in the dual basis; then f(a) = 1. 
Sinee «**(f) = f(a) ~ 0, we see in particular that a** ~ 0. The mapping 
t—» ¢** is thus injective, and it is then bijective by the corollary of Theorem 
2.4. U 


If we think of V** as being naturally identified with V in this way, the two 
spaces V and ¥V* are symmetrically related to each other. Each is the dual of 
the other. In the expression ‘f{£)’ we think of beth symbols as variables and 
then hold one or the other fixed for the two interpretations. In such a situation 
we often use 2 more symmetric symbolism, such as (£, /), to indicate our inten- 
Lion to treat both symbols as variables. 


Lemma 3.1. If {\;} is the basis in V* dual to the basis {a;} in V, then 
{a**} is the basis in V** dual to the basis {A;} in V*. 


Proof. We have a¥*(k;) = 4,(a;) = 8, which shows that a#* is the ith eoordi- 
nate projection. In case the reader has forgotten, the basis expansion f = > e;\; 
implies that af*(f) = fla) = (X ¢;A;){ax1) = ¢, so that af* is the mapping 
free. O 


Annihilator subspaces. It is in this dual situation that orthogonality first 
naturally appears. However, we shall save the term ‘orthogonal’ for the latter 
context in which V and V* have been identified through a sealar product, and 
shall speak here of the annihilater of a set rather than its orthogonal com- 
plement. 


84 FINITE-DIMENSIONAL VECTOR SPACES 2.3 


Definition. If A CY, the annihilaior of A, A®, is the set of all fin V* such 
that f(a) = 0 for all @ in A. Similarly, if A Cc V*, then 


A°= {aE ¥: fle) = 0 forall fe A}. 
If we view V as (¥*)*, the sccond definition is included in the first. 


The following properties are easily established and will be left as exercises: 

1) A® is always a subspace. 

2 ACB=>RBcA*. 

3) (L(A))° = A®. 

4) (AUB) = ATOR. 

5) ACA™, 

We now add one more crucial dimensional identity to those of the last 
section. 


Theorem 3.3, If W is a subspace of V, then d(V) = d(W) + d(W*). 


Proof. Let {837 be a basis for W, and extend it to a basis {6,}} for V. Let 
{4,37 be the dual basis in V*. We claim that then {\;};,4) is a basis for W®. 
lirst, if 7 > m, then A,(8;) = 0 for 7 = 1,...,m, and so A; is m W® by (3) 
above. Thus {\my1,+.+3;An) CW. Now suppose that fe W°, and let f= 
Dje1 ci; be its (dual) basis expansion. Then for each ¢ < m we have ej = 
f(@;) = 0, since 8; € W and fe W°; therefore, f = 0.4, ¢jA;. Thus every fin 
W° is in the span of {A;}34.- Allogether, we have shown that W° is the span of 
fr,}"4,, as claimed. Then d(W°) + d(W) = (2 — m) +m = n= dV), and 
we are done, U 


Corollary. A° = L(A) for every subset A CY. 


Proof. Since (L(A))° = A®, we have d(L(A)} + d{A% = a(V), by the 
theorem. Also d(A°) + d(A%) = d(V*) = d(V). Thus d(A°%) = d(L(A)), 
and since L(A) C A°%°, by (5) above, we have L(A) = A°. O 


The adjoint of T. We shall now see that with every T' in Hom(V, W) there is 
naturally associated an element of Hom(W*, V*)} which we call the adjoint of 
T and designate T*. One consequence of the intimate relationship between T 
and 7* is that the range of T* is exactly the annihilator of the null space of 
T. Combined with our dimensional identities, this implics that the ranges of T 
and T* have the same dimension. And later on, after we have established the 
connection between matrix representations of T' and 7*, this turns into the very 
mysterious fact that the dimension of the linear span of the row vectors of an m- 
by-7 matrix ts the same as the dimension of the linear span of its column vectors, 
which gives us our notion of the rank of a matrix. In Chapter 5 we shall study a 
situation (Hilbert space) in which we are given a fixed fundamental isomorphism 
between V and V*. If T isin Hom V, then of course T* is in Hom V*, and we 
can use this isomorphism to “transfer” T* into Hom V. But now T ean be com- 


2.3 THE DUAL SPACE 85 


pared with its (transferred) adjoint 7*, and they may he equal. That is, 7’ may 
be self-adjoint. It turns out that the self-adjoint transformations are “nice” ones, 
as we shall see for ourselves in simple cases, and also, fortunately, that many 
important linear maps arising from theoretical physics are self-adjoint. 

If T € Hom(V, W) and 7 € W*, then of course /¢ T € V*. Moreover, the 
mapping 1+> le T (T fixed) is a linear mapping from W’* to V* by the corollary 
to Theorem 3.3 of Chapter 1. This mapping is called the adjoint of T and is 
designated 7*. Thus 7'* € Hom{W*, V*) and T*(f) = de T for all /e W*. 


Theorem 3.4. The mapping 7 +> T* is an isomorphism from the vector 
space Hom(¥, W) to the vector space Hom({W*, ¥*). Also (7° S)* = 
S*o T* under the relevant hypotheses on domains and codomains. 


Proof. Everything we have said above through the linearity of TH T* is a 
consequence of the bilinearity of w(?, T) = 1° T. The map we have ealled 7'* 
is simply wr, and the linearity of T+ T* thus follows from Theorem 6.1 of 
Chapter I. Again the reader might benefit from a direct linearity check, begin- 
ning with (6; 7) + c2T2)* (2). 

To see that T + 7* js injective, we take any T + 0 and choose a € V so 
that T(a) ~ 0. We then choose / € W* so that I(T(a)) # 0. Since i(T{a)) = 
(T*(2)) («), we have verified that T* = 0. 

Next, if d(V) = m and d(W) = n, then also d(V*) = m and d(W*) = n° 
by the corollary of Theorem 3.1, and d(Hom(V, W)} = mn = d(Hom(W*, V*)) 
by Theorem 2.5. The injective map T +> T* is thus an isomorphism (by the 
corollary of Theorem 2.4). 

Finally, (fo S)*t = lo (ToS) = (lo T)oS = S*(lo T) = S*(T*()) = 
(S* o T*)i, so that (To S)* = S*o T*, O 


The reader would probably guess that 7’** becomes identified with T under 
the identification of V with V**. This is so, and it is actually the reason for 
calling the isomorphism ¢ +> &** natural. We shall return to this question at the 
end of the section. Meanwhile, we record an important elementary identity. 


Theorem 3.5. (R(T*))° = N(T) and N(T*) = (R(T))°. 


Proof. The following statements are definitionally equivalent in pairs as they 
oceur: | € N(T*), T*Q) = 0,10 T = 0,1(T(t)) = Oforall eV, e (R(T))°. 
Therefore, N(T*) = (R(T))°. The other proof is similar and will be left to the 
reader. [Start with a & N(7) and end with a € (R(T*))°.] O 


The rank of a linear transformation is the dimension of its range space. 
Corotlary. The rank of 7"* is equal to the rank of 7. 


Proof. The dimensions of R(T) and (N{T))° are each d(V) — d(N(T)} by 
Theorems 2.4 and 3.3, and the second is d(R(T*)) by the above theorem. 
Therefore, d(R(T)) = d(R(T*)). 0 


86 FINITE-DIMENSIONAL VECTOR SPACES 2.3 


Dyads. Consider any TJ in Hom(V, W’) whose range 4/7 is one-dimensional. If 
is a nonzero vector in AY, then z+ xf is a basis isomorphism @: R — M and 
6-1 o T: V SR is @ linear functional Xe V*. Then T = 002 and T(t) = 
\(£)8 for all £. We write this as T = A(-)8, and call any such T a dyad. 


Lemma 3.2, If 7 is the dyad X{-)8, then T* is the dyad 8**(-)x. 


Proof. (T*@)(H = @e THE = KPH) = UA(Hs) = Us)r(8), so that 
T*(D) = UB) = B**MA, and T* = g**(-)r. 0 


*Natural isomorphisms again. We are now in a position to illustrate more 
precisely the notion of @ natural isomorphism. We saw above that among all the 
isomorphisms from a finite-dimensional vector space V to its second dual, we 
could single one out naturally, namely, the map ¢+> £**, where £**(f) = f() 
for all fin V*. Let us call this isomorphism gy. The technical meaning of the 
word ‘natural’ pertains to the collection {yy} of all these isomorphisms; we 
found a way to choose one isomorphism ¢y for each space VY, and the proof that 
this is a “natural” choice lies in the smooth way the various gy’s relate to each 
other. To see what we mean by this, consider two finite-dimensional spaces V 
and W and a map T in Hom(Y, W). Then 7T* is in Hom(W%*, V*) and T** = 
(T*)* is in Hom({V**, W**). The setting for the four maps 7', T**, gy, and gw 
can be displayed in a diagram as follows: 


Vv r W 
ay ey 
Tt 
per ——- - — —s wee 


The diagram indicates two maps, gw oe T and T** o gy, from V to W**, and we 
define the collection of isomorphisms {yy} to be natural if these two maps are 
always equal for any ¥, Wand 7. This is the condition that the two ways of 
going around the diagram give the same result, i.e., that the diagram be com- 
mutative. 

Put another way, it is the condition that T “become” T** when V is identi- 
fied with V** (by gi) and W is identified with W** (by yyw). We leave its proof 
as 4n exercise. : 


EXERCISES 


3.1 Let @ be an isomorphism from a vector space V to R”. Show that the functionals 
{x;° 6}7 form a basis for ¥*. 

3.2 Show that the standard isomorphism from R” to (IR*)* that we get by composing 
the coordinate isomorphism for the standard basis for R* (the identity} with the dual 
basis isomorphism for (R")* is just our friend at l,, where Ig(x) = S(}aj;2;. (Show 
that the dual basis isomorphism is a+» DUT ami.) 


2.3 THE DUAL SPACE 87 


3.3 We know from Theorem 1.6 that a choice of a basis {8;} for V defines an isomor- 
phism from W?* to I¥om(V, W) for any vector space W. Apply this fact and Theerem 
1.3 to obtain a basis in V*, and show that this basis is the dual basis of {8;}. 


3.4 Prove the properties of 4° that are listed in the text. 


3.5 Find (a basis for) the annihilator of <1,1,1> in R8, (Use the isomorphism 
of (IR%)* with R? to express the basis vectors as triples.) 


3.6 Find (a basis for) the annihilator of {<1,1,1>, <1, 2,3>} in R4. 
3.7 Find (a basis for) the annihilator of {<1, 1,1,1>, <1, 2,3,4>} in R4. 
3.8 Show thatif V = Af @ N, then ¥* = M° @® N°. 


3.9 Show that if Jf is any subspace of an n-dimensional vector space V and d(jf) = 
m, then Mf can be viewed as being the linear span of an independent subset of m 
elements of V or as being the annihilator of (the intersection of the null spaces of) an 
independent subset of n — m elements of V*. 


3.10 If B = {f}7' is a finite collection of linear functionals on V (BC V*), then its 
annihilator 8° is simply the intersection N = []{ N; of the null spaces VN; = N(f) 
of the functionals f;. State the dual of Theorem 3.3 in this context. That is, take W 
as the linear span of the functionals f;, so that WC V* and W°C ¥. State the dual 
of the corollary. 


3.11 Show that the following theorem is a consequence of the corollary of Theorem 3.3. 


Theorem, Let N be the intersection [|] N; of the null spaces of a set {f;}7 of 
linear functionals on V, and suppose that gin V* is zeroon NW. Then g is a linear 
combination of the set {f;} 7. 


3.12 A corollary of Theorem 3.3 is that if W is a proper subspace of V, then there is 
at least one nonzero linear functional f in V* such that f = 0 on W. Prove this fact 
directly by elementary means. (You are allowed to construct a suitable basis.) 


3.18 An m-tuple of linear functionals {f;}]' on a vector space V defines a linear 
mapping at> <fila),...,fm{a)> from V to R®™. What theorem is being applicd 
here? Prove that the range of this linear mapping ts the whole of R” if and only if 
{ft} is an independent set of functionals. (Hint: If the range is a proper subspace H’, 
there is a nonzero m-tuple a such that >>? a;z; = 0 for all x € W] 


3.14 Continuing the above exercise, what is the null space NW of the linear mapping 
wh <fila),...,fm{a)>? If gis a linear functional which is sero on NV, show that ¢g 
is a linear combination of the f;, now as a corollary of the above exercise and Theorem 
4.3 of Chapter 1. (Assume the set {f;}7 independent.) 


3.15 Write out from scratch the proof that 7 is linear [for a given Tin Hom(V, W)]. 
Also prove directly that T+ 7* is linear. 


3.16 Prove the other half of Theorem 3.5. 


3.17 Let 6; be the isomorphism a++ a** from V; to V;** fori = 1, 2, and suppose 
given T in Hom(¥i1, V2). The loose statement T = T** means exactly that 


T** = §20 To 67} or T** 09, = Boo TF. 


Prove this identity. As usual, do this by proving that it holds for each a in V1. 


88 FINITE-DIMENSIONAL VECTOR SPACES 2.4 


3.18 Let é:R* YF be a basis isomorphism. Prove that the adjoint 6* is the coordi- 
nate isomorphism for the dual basis if (R*)* is identified with R” in the natural way. 
3.19 Let w be any bilinear functional on ¥ X II’. Then the two associated linear 
transformations are T: ¥ — W* defined by (T(E))(q) = off, 9) and S: IT — V* 
defined by (S()}(8) = w(€, 9). Prove that S = 7* if W is identified with W**. 
3.20 Suppose that f in (R™)* has coordinate m-tuple a [f(y) = >)? ay] and that T 
in Hom(R*, R™) has matrix t = {t,;}. Write out the explicit expression of the number 
(7(s)) in terms of all these coordinates. Rearrange the sum so that it appears in 
the form 


g(x) = Do bir, 
1 


and then read off the formula for b in terms of a. 


4. MATRICES 


Matrices and linear transformations. The reader has alrcady learned something 
about matrices and their relationship to linear transformations from Chapter 1; 
we shall begin our more systematic discussion by reviewing this earlicr material. 
By popular conception a matrix is a rectangular array of numbers such as 


diy fig «-- ban 
tz) tog +... fon 
Uni tm2 ore tmn 


Note that the first index numbers the rows and the second index numbers the 
columns. If there are # rows and 2 columns in the array, it is called an m-by-n 
(m < ») matrix. This notion is inexact. A rectangular array is a way of picturing 
a matrix, but a matrix is really a function, just as a sequence is a function. With 
the notation 7 = {1,...,m}, the above matrix is a function assigning a num- 
ber to every pair of integers <7,7> in %@X 7. Ht is thus an element of the set 
R***_ The addition of two m X » matriccs is performed in the obvious place- 
by-place way, and is mercly the addition of two functions in R®*”; the same is 
true for scalar multiplication. The set of all m X n matrices is thus the vector 
space R***, a Cartesian space with a rather fancy finite index set. We shall use 
the customary index notation #;; for the value t(z, 7) of the function t at <7, j>, 
and we shall also write {2;;) for 1, just as we do for sequences and other indexed 
collections. 

The additional properties of matrices stem from the correspondence between 
m® <n matrices {t;;} and transformations 7 € Hom(R”, R™). 

The following theorem restates results from the first chapter. See Theorems 
1.2, 1.3, and 6.2 of Chapter I and the discussion of the linear combination map 
at the cnd of Section 1.6. 


24 MATRICES 89 


Theorem 4.1. Let {i;;} be an m-by-n matrix, and let t’ be the m-tuple that 
is its 7th column for? = 1,...,#. Then thereis a unique 7 in Hom(R”, R™) 
such that skeleton T = {t’}, ie., such that T(6’) = t’ forallj. T is defined 
as the linear combination mapping x +> y = (}., 2,t’, and an equivalent 
presentation of T is the collection of scalar equations 


n 
vi = D> bij; for += I1,...,m. 
j=1 


Each 7 in Hom(R", R”) arises this way, and the bijection {6;;5 + T from 
R*™*" to Hom(R”, R™) is a natural isomorphism. 


‘The only additional remark called for here is that in identifying an m * 2 
matrix with an n-tuple of m-tuples, we are making use of one of the standard 
identifications of duality (Section 0.10). Weare treating the natural isomorphism 
between the really distinct spaces R”*” and (R™)* as though it were the identity. 

We can also relate T to {t;;} by way of the rows of {é,;}. As above, taking 
ith coordinates in the m-tuple equation y = 3(7_, x,t’, we get the equivalent 
and familiar system of numerical (scalar) equations y; = S03_, &;2; for 
a= ],...,m. Now the mapping x +> >03-, ¢,7; from R"® to R is the most. gen- 
eral linear functional] on R®. In the above numerical equations, therefore, we 
have simply used the m rows of the matrix {é,;} to present the m-tuple of linear 
functionals on R” which is equivalent to the single m-tuple-valued linear 
mapping 7 in Hom{R*, R”) by Theorem 3.6 of Chapter 1. 

The choice of ordered bases for arbitrary finite-dimensional spaces V and W 
allows us to transfer the above theorem to Hom(¥, W). Since we are now going 
io correlate a matrix t in R™** with a transformation T in Hom(¥’, W), we shall 
ilesignate the transformation in Hom(R®, R”) discussed above by T. 


Theorem 4.2, Let {a;}7 and {8;}7' be ordered bases for the vector spaces V 
and W, respectively. For each matrix {#,;} in R™** let 7 be the unique 
element of Hom({V, W) such that 7(a;) = D7, &,8; for 7 = 1,...,7. 
Then the mapping {t;;} + T is an isomorphism from R®*" to Hom(V, W). 


Proof. We simply combine the isomorphism {é;;} +> 7 of the above theorem 
with the isomorphism 7 > 7 = y > To yg! from Hom(R*, R™) to Hom(V, W), 
where ¢ and y are the two given basis isomorphisms. Then T is the transforma- 
tion described in the theorem, for T(a;) = ¥(7(y—"(a;))) = ¥(7(8)) = 
v(t} = 3, t:,8;. The map {f,;} — T is the composition of two isomorphisms 
and so is an isomorphism. J 


It is instructive to look at what we have just done in a slightly different way. 
Civen the matrix {t,;}, let 7; be the vector in W whose coordinate m-tuple is the 
jth column t? of the matrix, so that 7; = 5%, #;;8; Then let T be the unique 
clement of Hom(V, W) such that T(a;) = 7; forj = 1,...,2”. Now we have 
obtained T from {é,;} in the following two steps: 7 corresponds to the n-tuple 


90 FINITE-DIMENSIONAL VECTOR SPACES 2.4 


{7;}] under the isomorphism from Hom(V, W) to W" given by Theorem 1.6, 
and {r;}7 corresponds to the matrix {t;;} by extension of the coordinate isomor- 
phism between W and R™ to its product isomorphism from W” to (R™)”. 


Corollary. If y is the coordinate m-tuple of the vector q in W and x is the 
coordinate n-tuple of = in V (with respect to the given bases), then » = T'(£) 
if and only if ys = D%, tye; ford = 1,...,m. 


Proof. We know that the scalar equations are equivalent te y = T(x), which is 
the equation y = y—!o T° y(x). The isomorphism ¥ converts this to the 
equation y = 7). 0 


Our problem now is to discover the matrix analogues of relationship between 
linear transformations. For transformations between the Cartesian spaces R” 
this is a fairly direct, uncomplicated business, because, as we know, the matrix 
here is @ natural alter ego for the transformation. But when we leave the Car- 
tesian spaces, a transformation T no longer has a matrix in any natural way, and 
only acquires one when bases ere chosen and a corresponding 7 on Cartesian 
spaces is thereby obtained. All matrices now are determined with respect to 
chosen bases, and all calculations are complicated by the necessary presence of 
the basis and coordinate isomorphisms. There are two ways of handling this 
situation. The first, which we shall follow in general, is to describe things 
directly for the general space V and simply to accept the necessarily more compli- 
cated statements involving bases and dual bases and the corresponding loss in 
transparency. The other possibility is first to read off the answers for the 
Cartesian spaces and then to transcribe them via coordinate isomorphisms. 


Lemma 4,1. The matrix element t,; can be obtained from 7 by the formula 


jy = uy (T(;)), 
where pz is the Ath element of the dual basis in W*. 
Proof. we(T(es)) = pe(OThy 448) = Xi teger(B) = Ce tz Of = 5. O 


In terms of Cartesian spaces, T(6’), is the jth column m-tuple t’ in the 
matrix {t;;} of T, and &; is the kth coordinate of t’. From the point of view of 
linear maps, the kth coordinate is obtained by applying the Ath coordinate 
projection 2a,, so that t&;—= m,(7(8’)). Under the basis isomorphisms, m;, 
becomes px, T becomes 7, & becomes «;, and the Cartesian identity becomes 
the identity of the lemma. 


The transpose. The transpose of the m *K matrix {f;;} is the 2  m matrix 
{t} defined by é§ = ¢;; for all ¢, 7. The rows of t* are of course the columns of t, 
and conversely. 


Theorem 4.3, The matrix of 7* with respect to the dual bases in W* and 
¥* is the transpose of the matrix of T. 


2.4 MATRICES 91 
Proof. If s is the matrix of 7'*, then Lemmas 3.1 and 4.1 imply that 
831 = 053" (T*(us)) = a5 "(uso T) 
= (ui ° T)(a;) = ws (T(e,))) = 4;. 0 


Definition. The row space of the matrix {¢;;} € R®*” is the subspace of R® 
spanned by the # row vectors. The column space is similarly the span of 
the » column vectors in R™. 


Corollary. The row and column spaces of a matrix have the same dimension. 


Proof. If T is the element of Hom(R*, R”) defined by 7(8’) = t’, then the 
set {t7}7 of column vectors in the matrix {t;;} is the image under T of the stan- 
dard basis of R®, and so its span, which we have called the column space of the 
matrix, is exactly the range of 7. In particular, the dimension of the column 
space is d(R(T)) = rank T. 

Since the matrix of 7* is the transpose t* of the matrix t, we have, similarly, 
that rank 7'* is the dimension of the column space of t*. But the column space 
of t* is the row space of t, and the assertion of the corollary is thus reduced to 
the identity rank 7* = rank T, which is the corollary of Theorem 3.5. 0 


This common dimension is called the rank of the matrix. 


Matrix products. If T © Hom(R*, R®) and S € Hom(R”, R3, then of course 
R= So T © Hom(R’, R’), and it certainly should be possible to caleulate the 
matrix r of & from the matrices s and t of S and T, respectively. To make this 
computation, we set y = T(x) and z = S(y), so that z = (So T)(x) = R(x). 
The equivalent scalar equations in terms of the matrices t and s are 


tid m 
y= D> tan and ze = DS) eeeyi, 
hk=1 t=] 


so that 
m n n Mm 
= oe Ski >: fant, = > > stan) Bh. 
But z = Shei rear, fork = 1,...,2. Taking x as 8’, we have 


Tes = Do Seite; for all andj. 


im] 


We thus have found the formula for the matrix r of the map 2 = So T:x—z. 
Of course, r is defined to be the product of the matrices s and t, and we write 
r=>s:‘torr= st. 

Note that in order for the product st to be defined, the number of columns 
in the left factor must equal the number of rows in the right factor. We get the 
clement 7;,; by going across the Ath row of s and simultaneously down the jth 


92 FINITE-DIMENSIONAL VECTOR SPACES 2.4 


column of t, multiplying corresponding elements as we go, and adding the 
resulting products. This process is illustrated in Tig. 2.1. In terms of the scalar 
product (x,y) = 27 zy, on R”, we see that the element r,; in r = st is the 
scalar product of the kth row of s and the jth column of t. 


it by m) x (in by rv) s a by x) 


jth column 
s o t = r 
Fig. 2.1 


Since we have defined the product of two matrices as the matrix of the 
product of the corresponding transformations, 1.e., so that the mapping J't— {6,;} 
preserves products (So T+ st), it follows from the general principle of 
Theorem 4.1 of Chapter 1 that the algebraic laws satisfied by composition of 
transformations will automatically hold for the product of matrices. For 
example, we know without making an explicit computation that matrix multipli- 
cation is associative. Then for square matrices we have the following theorem. 


Theorem 4,4. The set Af,, of square n X n matrices is an algebra naturally 
isomorphic to the algebra Hom(R”). 


Proof. We already know that 7’ -— {#,;} is a natural linear isomorphism from 
Hom(R”) to Af, (Theorem 4.1), and we have defined the product of matrices 
so that the mapping also preserves multiplication. The laws of algebra (for an 
algebra) therefore follow for M,, from our observation in Theorem 3.5 of Chapter 
1 that they hold for Hom(R*). Q 


The identity J in Hom(R”) takes the basis vector 6? into itself, and therefore 
its matrix e has 6’ for its jth column: ef = 6’. Thus ej = = 1 if i=j 
and ¢;; = 8} = 0 if¢ #7. That is, the matrix e is 1 along the main diagonal 
(from upper left to lower right) and 0 elsewhere. Since J +> e under the algebra 
isomorphism 7 +> t, we know that e is the identity for matrix multiplication. 
Of course, we can check this directly: [= :j¢j. = fx, and similarly for mul- 
tiplying by e on the left. The symbol ‘e’ is ambiguous in that we have used it 
to denote the identity in the space R**~* of square » X n matrices for any 2. 


Corollary. A square xn X n matrix t has a multiplicative inverse if and only 
if its rank is n. 


2.4 MATRICES 93 


Proof. By the theorem there exists an s € M, such that st = ts = e if and 
only if there exists an S € Hom(R*) such that Se T = ToS = I. But such 
an S exists if and only if T is an isomorphism, and by the corollary to Theorem 2.4 
this is equivalent to the dimension of the range of T being n. But this dimension 
is the rank of t, and the argument is complete. 0 


A square matrix (or a transformation in Hom V) is said to be nonsingular 
if it is invertible. 


Theorem 4.5. If {a;)%, {8;)", and {y,}4 are ordered bases for the vec- 
tor spaces U,V, and W, respectively, and if 7 € Hom(U’, V) and 
S € Hom(V, W), then the matrix of S$ o T is the product of the matrices 
of S and T (with respect to the given bases). 


Proof. By definition the matrix of So T is the matrix of Se T =x7!e (Se T)o 
in Hom(R", R'), where ¢ and x are the given basis isomorphisms for U and W. 
But if ¥ is the basis isomorphism for V, we have 


Toh = (xo Soyo Y te Toy) = Sef, 


and therefore its matrix is the product of the matrices of 5S and T by the defini- 
tion of matrix multiplication. The latter are the matrices of S and T with respect 
10 the given bases. Putting these observations together, we have the theorem. 0 


There is a simple relationship between matrix products and transposition. 


Theorem 4.6. If the matrix product st is defined, then so is t*s*, and 
t*s* = (st)*. 


Proof. A direct calculation is easy. We have 


Mm m 
(st)jx = (ste; = DD setiy = OO Gaske = (C%s") 
fai 


i=1 
Thus (st)* = t*s*, as asserted. 4 


This identity is clearly the matrix form of the transformation identity 
(So T)* = T*o S*, and it can be deduced from the latter identity if desired. 


Cartesian vectors as matrices. We can view an n-tuple x = <2,,...,24> 
as being alternatively either an x < 1 matrix, in which case we call it a column 
vector, or a 1 X nmatrix, in which case we call it a row vector. Of course, these 
identifications are natural isomorphisms. The point of doing this is, in part, that 
then the equations y; = >°}n1 tj;a; say exactly that the column vector y is the 
matrix product of t and the column vector x, that is, y= t-x. The linear map 
T:R* — R”™ becomes left multiplication by the fized matrix t when R” ts viewed as 
the space of n X 1 column vectors. For this reason we shall take the column 
vector as the standard matrix interpretation of an n-tuple x; then x* is the 
corresponding row vector. 


94 FINITE-DIMENSIONAL VECTOR SPACES 2.4 


In particular, a linear functional F € (R”)* becomes left multiplication by 
iis matrix, which is of course 1 xX x (F being from R” to R'), and therefore is 
simply the rew matrix interpretation of an n-tuple in R*. That is, in the natural 
isomorphism a > L, from R” to (R")*, where £,(x) = 507 a,z;, the functional 
ZL, can now be interpreted as left matrix multiplication by the z-tuple a viewed 
as the vew vector a*. The matrix product of the row vector (1 X 2 matrix) a* 
and the column yector (2 X 1 matrix) x is a 1 X 1 matrix a*-x, that is, a 
number, 

Let us now see what these observations say about T*. The number £,(7'(x)) 
is the 1 X 1 matrix a*tx. Since L,(7(x)) = (1*(L,))(x) by the definition of 
T*, we see that the functional 7'*(Z,) is left multiplication by the row vector 
#*t. Since the row vector form of ZL, is a* and the row vector form of T*(Z,) is 
a*t, this shows that when the functionals on R” are interpreted as row vectors, 
T* becomes right multiplication by t. This only repeats something we already 
know. If we take transposes to throw the row vectors into the standard column 
vector form for x-tuples, it shows that 7’* is left multiplication by t*, and so 
gives another proof that the matrix of T* is ¢*. 


Change of basis. If g: x § = Di x6; and @:yr & = V7 y,8! are two basis 
isomorphisms for V, then A = @—' o ¢ is the isomorphism in Hom(R”) which 
takes the coordinate z-tuple x of a vector £ with respect to the basis {8;} into the 
coordinate n-tuple y of the same vector with respect to the basis {7}. The 
isomorphism A is called the “change of coordinates” isomorphism. In terms 
of the matrix a of A, we have y = ax, as above. 

The change of coordinate map A = 07! © g should not be confused with the 
similar looking T = 9° g~'. The latter is a mapping on Y, and is the element 
of Hom(V) which takes each 4; to @%. 


R™ Fig. 2.2 


We now want to see what happens to the matrix of a transformation 
T € Hom(V, W) when we change bases in its domain and codomain spaces. 
Suppose then that y; and ¢2 are basis isomorphisms from R” to VY, that 41 and pe 
are basis isomorphisms from R™ to W, and that t’ and t” are the matrices of T 
with respect to the first and second bases, respectively. That is, t’ is the matrix 
of J’ = ())7' ¢ To w, € Hom(R*, R”), and similarly for t'’. The mapping 
A = »3'¢ gy, € Hom(R”) is the change of coordinates transformation for 
V: if x is the coordinate n-tuple of a vector & with respect to the first basis 
[that is, = ¢1({x)], then A(x) is its coordinate n-tuple with respect to the second 
basis. Similarly, let B be the change of coordinates map #3! o ¥, for W. The 
diagram in Fig. 2.2 will help keep the various relationships of these spaces and 


2.4 MATRICES 95 


mappings straight. We say that the diagram is commutative, which means that 
any two paths between two points represent the same map. By selecting various 
pairs of paths, we can read off all the identities which hold for the nine maps 
T,T’, T”, 01, o2, A, 1, 2, B. For example, T” can be obtained by going back-~ 
ward along A, forward along 7’, and then forward along B, That is, T” = 
Bo T’o Aq. Since these “outside maps” are all maps of Cartesian spaces, we 
can then read off the corresponding matrix identity 


t” = bia}, 


showing how the matrix of T with respect to the second pair of bases is obtained 
from its matrix with respect to the first pair. 

What we have actually done in reading off the above identity from the 
diagram is to eliminate certain retraced steps in the longer path which the 
definitions would give us. Thus from the definitions we get 


BoT’s Am = (yy! oyy) oie Pe vs) 0 (vi! v2) = pa oT eo ge = 1". 


In the above situation the domain and codomain spaces were different, and 
the two basis changes were independent of each other. If W = V, so that 
T € Hom(V), then of course we consider only one basis change and the formula 
becomes 

t’=a-t’-aq}, 


Now consider a linear functional F € V*. If f’ and f’ are its coordinate 
n-tuples considered as column vectors (x XK 1 matrices), then the matrices of F 
with respect to the two bases are the row vectors (f’)* and (f’’)*, as we saw 
earlier. Also, there is no change of basis in the range space since here W = R, 
with its permanent natural basis vector 1. Therefore, b = e in the formula 
U’ = bt’a™!, and we have (f"")* = (f’)*a—! or 


ft" = (a7!)*£", 


We want to compare this with the change of coordinates of a vector & € Y, 
which, as we saw earlier, is given by 


These changes go in the oppositive directions (with a transposition thrown in). 
For reasons largely historical, functionals F in V* are called covariant vectors, 
and since the matrix for a change of coordinates in V is the transpose of the 
inverse of the matrix for the corresponding change of coordinates in V*, the 
vectors ¢ in V are called coniravariant vectors. These terms are used in e¢lassical 
tensor analysis and differential geometry. 

The isomorphism {f;;} +» 7, being from a Cartesian space R™**, is auto- 
matically a basis isomorphism. Its basis in Hom(V, W) is the image under the 
isomorphism of the standard basis in R™**, where the latter is the set of 
Kronecker functions 8 defined by 6(i,7) = 0 if <k,I> = <i,j> and 
&'(k, 2) = 1. (Remember that in R4, 4 is that function such that 6*(6) = 0 


9G FINITE-DIMENSIONAL VECTOR SPACES 2.4 


ifb # aand s(a) = 1. Here A = ™% X *% and the clements a of A are ordered 
pairs a= <k,i>.) The function 6 is that matrix whose columns are all 0 
except for the éth, and the ith column is the m-tuple é*. The corresponding 
transformation Dx: thus takes every basis vector a; to 0 except a; and takes a; 
to By. That is, Dyr(a;) = Oif7 ¥ 2, and Drfar) = By. Again, Dg takes the lth 
hasis veetor in V to the Ath basis veetor in W and takes the other basis vectors 
in ¥ to 0. 

If i= > Rai, it follows that Dyil 8) = Be. 

Since {Dz} is the basis defined by the isomorphism {4;;; > T, it follows 
that {é,;} is the coordinate sct of 7 with respect to this basis; it is the image of T 
wider the coordinate isomorphism. It is inicresting to see how this basis expan- 
sion of T automatically appears. We have 


T(é) = T (= as) = > a,T (ai) = x, £13038; = 2 £:;D,;(8), 


j=1 j=1 tg 
so that 
y es £;;D,;. 
tJ 


Our original discussion of the dual basis in V* was a special case of the 
present situation. There we had Hom(V, R) = V*, with the permanent stan- 
dard basis 1 for R. The basis for V* corresponding to the basis {a;} for V 
therefore consists of those maps D; taking o; to 1 and «; to 0 for 7 ¥ 2% Then 
Dit) = DE zja;) = x, and D; is the ith coordinate functional &;. 

Tinally, we note that the matrix expression of 7 € Hom(R*, R”) is very 
suggestive of the block decompositions of 7 that we discussed earlier in Section 
1.5. In the exereises we shall ask the reader to show that im fact Tez = terDes 


EXERCISES 


4.1 Prove thal if o: VX V3 R is a bilinear fanctional on Voand 7': 1 — 1* 
is the corresponding lincar transformation defined by (T(m))(é) = w(§, 9), then for 
any basis {a;} for V the matrix 4; = way, a,j) is the matrix of 7. 

4.2 Verify that the row and column rank of the following matrix are both 1: 

—5 2 3 
—10 4 6 

4.3 Show by a dircet calculation that if the row rank of a 2 3 matrix is 1, then so 
is its column rank, 

4.4 Let (f 3 be a linearly dependent set of @?-functions (twice continuously differ- 
entiable real-valued functions) on R. Show that the three triples <fylx), fix), fy (2) > 


are dependent for any z. Prove therefore that sin f, cas t, and e* are linearly indepen- 
dent. (Compute the derivative triples for a well-chosen x.) 


2.4 MATRICES 97 


4.5 Compute 


6 —2 3 1 
- LE 2 2] x ee = 
1 6-1 4 5 


4.6 Compute 
ab d —b| 
x i 
‘ il E 


From your answer give a necessary and sufficient condition for 


a b|-t 
|e d 
to exist. 


4.7 A matrix a is idempotent if a = a. Find a basis for the veetor space R?*? of 
all 2X 2 matrices consisting entirely of idempotents. 


4.8 By a direct calculation show that 
1 2 
—2 3 
4.9 Show by explicitly solving the equation 


Bae el re 


that the matrix on the left is invertible if and only if (the determinant) ad — be is not 


zero. 
4.10 Find a nonzero 2 X 2 matrix 

a b 

Pa 
whose square is zero. 
4.11 Find aii 2 2 matrices whose squares are zero. 
4.12 Prove by computing matrix products that matrix multiplication is associative. 
4.13 Similarly, prove directly the distributive law, (r+ s)-t = r-t+s-t. 


4.14 Show that left matrix multiplication by a fixed r in R™*? is a linear transforma- 
tion from R**? to R**?, What theorem in Chapter 1 does this mirror? 

4.15 Show that the rank of a product of two matrices is at most the minimum of their 
tanks. (Remember that the rank of a matrix is the dimension of the range space of its 
associated 7.) 

4.16 Leta be anm X n matrix, and let b ben X m. Ifim > n, show that a> b cannot 
be the identity e (a X m). 


is invertible and find its inverse. 


98 FINITE-DIMENSIONAL VECTOR SPACES 2.4 


4.17 Let Z be the subset of 2 2 matrices of the form 


a b 

—6 a 
Prove that Z is a subalgebra of R2*2 (that is, Z is closed under addition, scalar multipli- 
cation, and matrix multiplication), Show that in fact Z is isomorphic to the complex 
number system. 
4.18 A matrix (necessarily square) which is equal to its transpose is said to be sym- 
metric. AS @ Square array it is symmetric about the main diagonal. Show that for any 
m X nmatrix t the product t- t* is meaningful and symmetric. 
4.19 Show that if s and t are symmetric x X x matrices, and if they commute, then 
$s: tissymmetric. (Do not try to answer this by writing out matrix products.} Show 
conversely that if s, t, and s- t are all symmetric, then s and Ll commute. 
4.20 Suppose that T in Hom R? has a symmetric matrix and that 7 is not of the 
form ef. Show that 7 has exactly two eigenvectors (up to scalar multiples). What 
does the matrix of T become with respect to the “eigenbasis” for R? consisting of these 
two eigenvectors? 
4,21 Show that the symmetric 2X 2 matrix t has a symmetric square root s (s” = t) 
if and only if its eigenvalues are nonnegative. (Assume the above exercise.) 


4.22 Suppose that t is a 2X 2 matrix such that t* = t7!. Show that t has one of 
the forms 


where a?-+ 6? = 1, 

4.23 Prove that multiplication by the above t is a Euclidean isometry. That is, 
show that if y = t- x, where x and y € R2, then {\x{| = |ly|j, where |lal] = (@j + 23)1?. 
4.24 Let {Dy! be the basis for Hom(V, HW’) defined in the text. Taking IV = Y, 
show that these operators satisfy the very important multiplication rules 


Dy° Dy = 0 if 7 #4, 
Due Du = Du. 


4.25 Keeping the above identities in mind, show that if ¢ * mm, then there are trans- 
formations S and fT in Hom Y such that 


SeoT—TeS = Dim. 
Also find S and T such that 
Sof — ToS = Dn— Dam- 


4.26 Given T in Hom R’, we know from Chapter 1 that T = 5°:,; T.;, where T,; = 
P,TP; and P; = 67; Now we also have 


r= ,? terDar 
KE 


Show from the definition of D,; in the text that PiDi,P; = Dz; and that PiDwP; = 0 
if either? # k orj x 2. Conclude that Ty; = Dj). 


2.5 TRACE AND DETERMINANT 99 


5, TRACE AND DETERMINANT 


Our aim in this short section is to aequaint the reader with two very special 
real-valued funetions on Hom V and to describe some of their properties. 


Theorem 5.1. If V is an n-dimensional vector space, there is exactly one 
linear functional \ on the vector space Hom(¥) with the property that 
MS o T) = A(T 2 S) forall S, Tin Hom(V’) and normalized so that A(J) = 2. 
If a basis is chosen for V and the corresponding matrix of T is {t;;}, then 
MT) = Sy bz, the sum of the elements on the main diagonal. 


Proof. If we choose a basis and define \(7") as 307 &;, then it is clear that \ is a 
linear functional on Hom(V) and that (J) = nx. Morcover, 


7 % Rr 
MSoT) = D> oe outs) = DS syjtje = Do tasiy = MT ° 8). 
text \J=l ijel ii 
That is, each basis for V gives us a functional \ in (Hom V)* such that A(S « 7) = 
MT o S),A(7) = n, and (7) = ¥ 2t,; for the matrix representation of that basis. 

Now suppose that » is any element of (Hom(V))* such that u(S ° 7) = 
w(T oS) and w(t) = n, If we choose a basis for V and use the isomorphism 
@: {t;;} +> T from R** to Hom V, we have a functional » = go @ on R*** 
{v = 6%) such that v(st) = (ts) and v(e}) = x. By Theorem 4.1 (or 3.1) v is 
given by a matrix ¢, v(t) = D751 esfiz, and the equation v(st — ts) = 0 
becomes Diijka1 Cos(Sinley — Sjates) = 0. 

We are going to leave it as an exercise for the reader to show that if 1 = m, 
then very simple special matrices s and t can be chosen so that this sum reduces 
to ¢im = 0, and, by a different choice, to cz: — Cam = 0. 

Together with the requirement that v(e) = n, this implies that cz, = 0 for 
i mand tmm = 1 form = i1,...,n. That is, v(t) = °F éam, and v is the 
d of the basis being used. Altogether this shows that there is a unique A in 
(Hom V)* such that \(S e 7) = A(T 2 S) forall Sand T and A(7) = xn, and that 
\(T) has the diagonal evaluation as ¥ &; in every basis. 0 


This unique A is called the frace furctional, and A(T) is the trace of T. It is 
usually designated tr(T’). 

The determinant function A(7) on Hom V is much more complicated, and 
we shall not prove that it exists until Chapter 7. Its geometric meaning is as 
follows. First, |A(2")| is the factor by which 7 multiplies volumes. More pre- 
cisely, if we define a “volume” v for subsets of V by choosing a basis and using 
the coordinate correspondence to transfer to V the “natural” volume on R”, 
then, for any figure A C V, v(7[A]} = |A(7)}|- e(A). This will be spelled out in 
Chapter 8. Second, A(7) is positive or negative according as T preserves or 
reverses orientation, which again is a sophisticated notion to be explained later. 
lor the moment we shall list properties of 4(7) that are related to this geometric 
interpretation, and we give a sufficient number to show the uniqueness of A. 


100 FINITE-DIMENSIONAL VECTOR SPACES 2.5 


Vv TA] 


M 


Fig. 2.3 


We assume that for each finite-dimensional vector space V there is a func- 
tion A (or Sy when there is any question about. domain) from Hom(V) to R such 
that the following are true: 

a) A(S o T) = A(S) A(T) for any S, T in Hom(V). 

b) Ifa subspace N of V is invariant under 7 and T is the identity on N and 

on V/N (that is, Tla] = & for each coset & = a+N of N), then 
A(?) = 1. Such a T is a shearing of V along the planes parallel to NV. 
In two dimensions it can be pictured as in Fig. 2.3. 


e) li Visadirect sum VY = Jf -+ N of T-invariant subspaces M and N, and 
ifR = Tf Mand S = T [ N, then A(T) = A(R) ACS). More exactly, 
Ay(T) = Am(R) Ay(S). 

d) If V is one-dimensional, so that any T in Hom(V} is simply multiplication 
by a constant cr, then A(T’) is that constant er. 


e) If V is two-dimensional and T interchanges a pair of independent vectors, 
then A(T) = —1. This is clearly a pure orientation-changing property. 


The fact that A is uniquely determined by these properties will follow from 
our discussion in the next section, which will also give us a process for calculating 
A. This process is efficient for dimensions greater than two, but for T in Hom(R?) 
there is a simple formula for A(T) which every student should know by heart. 


Theorem 5.2. If T is in Hom{R*) and {t;;} is its 2% 2 matrix, then 
A(T) = tyitee — treter. 


This is a special case of a general formula, which we shall derive in Chapter 7, 
that expresses A(T’) as a sum of x! terms, each term being a product of n numbers 
from the matrix of T. This formula is too complicated to be useful in computa- 
tions for large x, but for n = 3 it is about as easy to use as our row-reduction 
calculation in the next section, and for » = 2 it becomes the above simple 
expression. There are a few more properties of A with which every student 
should be familiar. They will all be proved in Chapter 7. 


Theorem 5.3. If T isin Hom V, then A(T*) = A(T). If @is an isomorphism 
from V to W and S = @e To 6~", then A(S) = A(T). 


2.5 TRACE AND DETERMINANT 101 


Theorem 5.4, The transformation 7’ is nonsingular (invertible) if and only 
if A(T) = 0. 


In the next theorem we consider 7 in Hom R", and we want to think of A(T) 
as a function of the matrix t of 7. To emphasize this we shall use the notation 
D(t) = ACP). 

Theorem 5.5 (Cramer’s rule). Given an x X n matrix t and an n-tuple y, 

Tet t [; y be the matrix obtained by replacing the jth column of t by y. Then 

yot-x = D(t)x; = Dt |; y) 

for all 7. 

If t is nonsingular [D(t) + 0], this becomes an explicit formula for the 
solution x of the equation y = t- x; it is theoretically important even in those 
cases when it is not useful in practice (large 7). 


EXERCISES 


5.1 Finish Theorem 5.1 by applying Exercise 4,25, 
5.2 It follows from our discussion of trace that tr(T) = + ti: is independent of the 
basis. Show that this fact follows directly from 
tr(t +s) = tr{s- b) 


and the change of basis formuls in the preceding section. 

5.3 Show by direct computation that the function d(t) = ¢j1f22 — fete, satisfies 
d(s-t) = d{s) d(t) (where s and t are 2X 2 matrices). Conclude that if V is two- 
dimensional and d(f) is defined for T in Hom V by choosing a basis and setting 
d(T) = a(t), then d(P) is actually independent of the basis. 

5.4 Continuing the above exercise, show that d(T) = A(T) in any of the following 
“eases: 

1) T interchanges two independent vectors. 
2) T has two eigenvectors. 
1 a 
a 1) 


3) T has a matrix of the form 

Show next that if 7 has none of the above forms, then T = Fe S, where S is of type 
(4) and & is of type (2) or (3). (Hint: Suppose T(a} = 8, with a and 6 independent. 
Let S interchange a and @, and consider R = T° S.] Show finally that d(7) = AC?) 
for all J in Hom V. (¥ is two-dimensional.) 

5.5 If t is symmetric and 2X 2, show that there is a 2X 2 matrix s such that 
a* = 5—!, As) = 1, and sts—! is diagonal. 

5.6 Assuming Theorem 5.2, verify Theorem 5.4 for the 2 X 2 case. 

5.7 Assuming Theorem 5.2, verify Theorem 5.5 for the 2X 2 case. 


102 FINITE-DIMENSIONAL VECTOR SPACES 2.6 


5.8 In this exercise we suppose that the reader remembers what a continuous fune- 
tion of a real variable is. Suppose that the 2 2 matrix function 


a(t) = (ee Hed 
a2i(t) aoo(t) 


has continuous components a;;( for ¢€ (0, 1), and suppose that a(é) is nonsingular 
for every ¢. Show that the solution y(t) to the linear equation a(t): y(é) = x(é) has 
continuous components y¥)(¢} and y(t) if the functions 2, (d) and x2(é) are continuous. 

5.9 A homogeneous second-order linear differential equation is an equation of the 
form 

y+ ay’ + aoy = 0, 

where a] = a,(£) and ag = aoft) are continuous functions. A solution is a @?-funetion 
f (i.e., a twice continuously differentiable function) such that f(t) + alOf/7@ + 
ao(t) fi) = 0. Suppose that f and g are C?-functions [on (0, 1}, say] such that the 


2X 2 matrix 
ps | 
SO 


is always nonsingular. Show that there is a homogeneous second-order differential 
equation of which they are both solutions. 

5.10 In the above exercise show that the space of all solutions is a two-dimensional 
vector space. That is, show that if 4(/) is any third solution, then & is a linear combi- 
nation of f and g. 

5.11 By a “linear motion” of the Cartesian plane R? into itself we shal] mean a con- 
tinuous map t+ t(x) from [0, 1] to the set of 2X 2 nonsingular matriecs such that 
t(0) = e. Show that A(t(1)) > 0. 

5.12 Show that if A(s) = 1, then there is a linear motion whose final matrix t(1)} is s. 


6. MATRIX COMPUTATIONS 


The computational process by which the reader learned to salve systems of 
linear equations in secondary school algebra was undoubtedly “elimination by 
successive substitutions”. The first equation is solved for the first unknown, and 
the solution expression is substituted for the first unknown in the remaining 
equations, thereby eliminating the first unknown from the remaining equations. 
Next, the second equation is solved for the second unknown, and this unknown is 
then eliminated from the remaining equations. In this way the unknowns are 
eliminated one at a time, and a solution is obtained. 
This same procedure also salves the following additional problems: 


1) to obtain an explicit basis for the linear span of a set of m vectors in R*; 
therefore, in particular, 


2) to find the dimension of such a subspace; 
3} to compute the determinant of an m X m matrix; 
4) to compute the inverse of an invertible m x m matrix, 


2.6 MATRIX COMPUTATIONS 103 


In this section we shall briefly study this process and the solutions to these 
problems. 

We start by noting that the kinds of changes we are going to make on @ 
finite sequence of vectors do not alter its span. 


Lemma 6.1. Let {a,;}7 be any m-tuple of vectors in a vector space, and let 
{3;}7 be obtained from {a;}7 by any one of the following elementary 
operations: 


1) interchanging two vectors; 
2) multiplying some a; by a nonzero scalar; 
3) replacing a; by a; — xa; for some} * 7 and somez ER. 
Then 
LBi3T) = L({ai}7). 


Proof. If af = a; ~~ va;, then aj = af + xa;. Thus if {6;)7 is obtained from 
{a;}7 by one operation of type (3), then {a;}7' can be obtained from {6,}7 by 
one operation of type (3). In particular, each sequence is in the linear span of 
the other, and the two linear spans are therefore the same. 

Similarly, each of the other operations can be undone by one of the same 
type, and the linear spans are unchanged. f 


When we perform these operations on the sequence of rew vectors in a 
matrix, we call them elementary row operations. 

We define the order of an n-tuple x = <21,...,2%,> as the index of the 
first nonzero entry. Thus if «; = 0 for? < 7 and x; ¥ 0, then the order of x 
is j. The order of <0, 0, 0,2, —1,0> is 4. 

Let {a;;} be an m X » matrix, let V be its row space, and let ny < mo < 
«++ < % be the integers that occur as orders of nonzero vectors in V. We are 
going to construct a basis for V consisting of & elements having exactly the 
above set of orders. 

If every nonzero row in {a;;} has order >p, then every nonzero vector x in 
V has order > p, since x is a linear combination of these row vectors. Since some 
vector in V has the minimal order 7, it follows that some row in {a;;} has order 
21. We move such a row to the top by interchanging two rows. We then multiply 
this row x by a constant, so that its first nonzero entry z,, is 1. Let a’,..., a” 
be the row vectors that we now have, so that a’ has order n, and a1, = 1. We 
next subtract multiples of a! from each of the other rows in such a way that the 
new 2th row has 0 as its n,-coordinate. Specifically, we replace a’ by at — a, -a? 
for? > 1. The matrix that we thus obtain has the property that its jth column 
is the zero m-tuple for each 7 < 74 and its x,th column is 6! in R™. Its first row 
has order 7;, and every other row has order >7,. Its row space is still V. We 
again call it a. 

Now let x = ©? cia‘ be a vector in V with order no. Then c, = 0, for if 
c, > 0, then the order of x is n,. Thus x is a linear combination of the second 


104 FINITE-DIMENSIONAL VECTOR SPACES 2.6 


to the. mth rows, and, just as in the first case, one of these rows must therefore 
have order nz. 

We now repeat the above process all over again, keying now on this vector. 
We bring it to the second row, make its nz-coordinate 1, and subtract multiples 
of it from all the other rows (including the first), so that the resulting matrix 
has 8? for its ngth column. Next we find a row with order ng, bring it to the 
third row, and make the nsth column 8°, ete. 

We exhibit this process below for one 3 x 4 matrix. This example is dis- 
honest in that it has been chosen so that fractions will not occur through the 
application of (2). The reader will not be that lucky when he tries his hand. 
Our defense is that by keeping the matrices simple we make the process itself 
more apparent. 


o-1 2 3} f2 2 4-2) ft 1 2 =-1 
2 2 4 — aj |e 1! 2 3] @y |O —I 2 3 
2 4 0 38 2 4 0 8 2 4 0 38 
aye. Sh ete a 2 
(3) j@ —1 2 3} (2) 0 1 —2 -3 
0 2-=& 5 0 2-4 5 
_, fi o 4 23 2 f1 o@ 4 2 
(3) 0 1 —2 —3] (1 |0 1 -2 —8 
0 6 oO ll o oOo Od i 

1 0 4 90 

(3) |0 1 —2 0 

ee | a | a | 


Note that from the final matrix we can tell that the orders in the row space 
are 1,2, and 4, whereas the original matrix only displays the orders 1 and 2. 

We end up with an m X x matrix having the same row space V and the 
following special structure: 


1) For 1 <7 < & the jth row has order x,. 


2) If k < m, the remaining m — k rows are zero (since a nonzero row would 
have order >7,, a contradiction). 


3) The nx;th column is 5’, 


It follows that any linear combination of the first k rows with coefficients 
¢1,...,¢, has c; in the n;th place, and hence cannot be zero unless all the 
c;’s are zero. These k rows thus form a basis for V, solving problems (1) and (2). 

Our final matrix is said to be in row-reduced echelon form. It can be shown to 
be uniquely determined by the space V and the above requirements relating its 
rows to the orders of the elements of V. Its rows form the canonical basis of Y. 


2.6 MATRIX COMPUTATIONS 105 


A typical row-reduced echelon matrix is shown in Fig. 2.4. This matrix is 8 X 11, 
its orders are 1, 4, 5, 7, 10, and its row space has dimension 5. It is entirely 0 
below the broken line. The dashes in the first five lines represent arbitrary num- 
bers, but any change in these remaining entries changes the spanned space Y. 

We shall now look for the significance of the row-reduction operations from 
the point of view of general linear theory. In this discussion it will be convenient 
to use the facet from Section 4 that if an n-tuplet in R® is viewed as an n x 1 
matnix (i.e., as a column vector), then the system of linear equations y, = 
DL, ajt;,% = 1,..., m, expresses exactly the single matrix equation y = a- x. 
Thus the associated linear transformation 4 € Hom(R*, R™) is now viewed as 
being simply multiplication by the matrix a; y = A(x) if and only if y = a-x. 


| eee | geen» earn fae 
:10-0--0- 
Li: 
{1 =+-0-+=+0-- 
saclanateaahe | 
\1--0.- 
bere 4 
[les 
- 0 
. a 
- 0 


Fig. 2.4 


We first note that each of our elementary row operations on an m X » 
matrix a is equivalent to premultiplication by a corresponding m  m elementary 
matrix u. Supposing for the moment that this is so, we can find out what u 
is by using the m < m identity matrix e. Since w- a = (u-e)- a, we see that 
the result of performing the operation on the matrix a can also be obtained by 
premultiplying a by the matrix u-e. That is, if the elementary operation can 
be obtained as matrix multiplication by u, then the multiplier is u-e. This 
argument suggests that we should perform the operation on e and then see if 
premultiplying a by the resulting matrix performs the operation on a. 

If the elementary operation is interchanging the 7oth and jpth rows, then 
performing it on e gives the matrix a with uj, = 1 for k ¥ to and k # jo, 
Uinig = Ujoin = 1 and uy; = O for all other indices. Moreover, examination of 
the sums defining the elements of the product matrix u - a will show that pre- 
multiplying by this u does just interchange the ¢y)th and joth rows of any 
m Xn matrix a. 

In the same way, multiplying the zyth row of a by c is equivalent to pre- 
multiplying by the matrix u which is the same as e except that wii = ©. 
Finally, multiplying the joth row by x and adding it to the zoth row is equivalent 
to premultiplying by the matrix u which is the identity e except that w,);, is x 
instead of 0. 


106 FINITE-DIMENSIONAL VECTOR SPACES 2.6 


These three elementary matrices are indicated schematically in Fig. 2.5. 
Each has the value 1 on the main diagonal and 0 off the main diagonal except as 
indicated. 


t Jo a Jo 


o 


Fig. 2.5 


These elementary matrices u are all nonsingular (invertible). The row inter- 
change matrix is its own inverse. The inverse of multiplying the jth row by ¢ 
is multiplying the same row by 1/c. And the inverse of adding ¢ times the jth 
row to the ith row is adding —c times the 7th row to the 7th row. 

If u’, u?,..., u” is a sequence of elementary matrices, and if 


b= u?-u?l.,.,- ul), 


then b-a is the matrix obtained from a by performing the corresponding 
sequence of elementary row operations on a. If u’,..., w? is 4 sequence which 
row reduces a, then r = b- a is the resulting row-reduced echelon matrix. 

Now suppose that a is a square m X m matrix and is nonsingular (invertible). 
Thus the dimension of the row space is m, and hence there are m different orders 
Ny,...,%. That is, & = m, and since 1 < ny < ne <--+ < ny = Mm, we 
must also have zn; = 2,7 = 1,...,m. Remembering that the 2,;th column in r is 
8°, we see that now the zth column in r is 6‘ and therefore that r is simply the 
identity matrix e. Thus b- a = e and b is the inverse of a. 

Let us find the inverse of 

si] 
3 4 


by this procedure. The row-reducing sequence is 


:F 1 Ob slob jot yf 


The corresponding elementary matrices are 


ls th fo a) ba} 


2.6 MATRIX COMPUTATIONS 107 


The inverse is therefore the product 


lo “alle ths ef lls d-[9 al 


Check it if you are in doubt. 

Finally, since b- e = b, we see that we get b from e by applying the same 
row operations {gathered together as premultiplication by b) that we used to 
reduce a to echelon form. This is probably the best way of computing the inverse 
of a matrix. To keep track of the operations, we can place e to the right of a to 
form a single m X 2m matrix a |e, and then row reduce it. In echelon form it 
will then be the m K 2m matrix e | b, and we can read off the inverse b of the 
original matrix a. 

Let us recompute the inverse of 


by this method. We row reduce 


getting 
1 2 1 oOo} — 1 2 
F 4 | 0 | (3) fi ~2 
(3) lo 1 


from which we read off the inverse to be 


Ee iI: 
2-4 


Finally we consider the problem of computing the determinant of a square 

m Xm matrix. We use two elementary operations (one modified) as follows: 

1’) interchanging two rows and simultaneously changing the sign of one of 
them; 


2) as before, replacing some row a; by a; — ra; for some j # 7. 


When applied to the rows of a square matrix, these operations leave the determ7- 
nant unchanged. This follows from the properties of determinants listed in 
Section 5, and its proof will be left as an exercise. Moreover, these properties 
will be trivial eonsequences of our definition of a determinant in Chapter 7. 

Consider, then, a square m K m matrix {a;;}. We interchange the first 
and pth rows to bring a row of minimal order » to the top, and change the sign 
of the row being moved down (the first row here). We do not make the leading 


108 FINITE-DIMENSIONAL VECTOR SPACES 2.6 


coefficient of the new first row 1; this elementary operation is not being used 
now. We do subtract multiples of the first row from the remaining rows, in order 
to make all the remaining entries in the 2,th column 0. The nth column is now 
¢16', where c, is the leading coefficient in the first row. And the new matrix has 
the same determinant as the original matrix. 

We continue as before, subject to the above modifications. We change the 
sign of a row moved downward in an interchange, we do net make leading 
coefficients 1, and we do clear out the nj;th column so that it becomes ¢;8”, 
where c; is the leading coefficient of the jth row (1 <j < k). As before, the 
remaining m — k& rows are 0 (if k < m). Let us call this resulting matrix 
semireduced. Note that we can find the corresponding reduced echelon matrix 
from it by & applications of (2); we simply multiply the jth row by 1/c; for 
j= 1,...,%. Ifs is the semireduced matrix which we obtained from a using 
(1’) and (3), then we shall show below that its determinant, and therefore the 
determinant of a also, is the product of the entries on the main diagonal: TT%, sy. 
Recapitulating, we can compute the determinant of a square matrix a by using 
the operations (1’) and (3) to change a to a semircduced matrix s, and then 
taking the product of the numbers on the main diagonal of s. 

If we apply this proccss to 


we get 
1 2], vases 1 2] — j!1 0 
3 A “) E 3] 6) \ +) 


and the determinant is 1 - {—2) = —2. Our2 X 2 determinant formula, applied 


to 
] 2 
E A 
gives 1-4—2-3=4-—6= —2, 

If the original matrix {a,;} is nonsingular, so that k = m and x; = 7 for 
2= 1,...,m, then the jth column in the semireduced matrix is ¢;6’, so that 
$j; = ¢j, and we are claiming that the determinant is the product []7, ¢; of the 
leading coefficients. 

To see this, note that if T is the transformation in Hom(R*) corresponding 
to our semireduced matrix, then 7'(8’) = c;6’, so that R” is the direct sum of n 
J-invariant, one-dimensional subspaces, on the jth of which T is multiplication 
by c;. It follows from {c) and (d) of our list of determinant properties that 
A(P?) = [i ¢; = II} s;;. This is nonzero. 

On the other hand, if {a,,;} is singular, so that k = d{(V) < m, then the mth 
row in the semireduced matrix is 0 and, in particular, sn», = 0. The product 
I]; 8s; is thus zero. Now, without altering the main diagonal, we can subtract 
multiples of the columns containing the leading row entries (the columns with 


2.6 MATRIX COMPUTATIONS 109 


indices n;} to make the mth column a zero column. This process is equivalent 
to postmultiplying by elementary matrices of type (2) and, therefore, again 
leaves the determinant unchanged. But now the transformation S of this matrix 
leaves R"~' invariant (as the span of 6’,..., 6"? in R™) and takes 8” to 0, 
so that AGS) = 0 by (c) in the list of determinant properties. So again the 
determinant is the product of the entries on the main diagonal of the semi- 
reduced matrix, zero in this case. 

We have also found that a matriz is nonsingular (invertible) if and onty if its 
determinant is nonzero. 


EXERCISES 


6.1 Compute the canonical basis of the row space of 


1 2 1 2 
ce oe oe | 

-1 -3 0 4 
0 4-1 —3 


6.2 Do the same for 


~1 —2 6 2 


6.3 Do the same for the above matrix but with a different first choice. 
6.4 Calculate the inverse of 


1 2 3 
2 3 4 
3 4 7 


by row reduction. Check your answer by multiplication. 
6.5 Row reduce 
1 2 3 Yi 
2 3 4 Yo} 
3 4 7 ¥S 


How does the fourth column in the row-reduced matrix compare with the inverse of 


1 2 3 
2 3 4 
3 4 7 


computed in the above exercise? Explain. 


6.6 Check whether or not <1,1,1,1>, <1,2,3,4>, <0,1,0,1>, and 
<4, 3, 2,1> are linearly independent by row reducing. Part of one of the row-reduc- 
ing operations is unnecessary for this check. What is it? 


110 FINITE-DIMENSIONAL VECTOR SPACES 2.6 


6.7 Let us call a k-tuple of vectors {c,;}% in R* canonical if the k X n matrix a with 
a; 4s its ith row for all 7 is in row-reduced echelon form, Supposing that an n-tuple é 
is in the row space of a, we can read off what its coordinates are with respect to the 
above canonical basis. What are they? How then can we check whether or not an 
arbitrary #-tuple & is in the row space? 

6.8 Use the device of row reducing, as suggested in the above exercise, to determine 
whether or not 61 = <1, 0,0,0> is in the span of <1,1,1,1>, <1, 2,3,4>, and 
<2,0,1,—1>. Do the same for <1, 2, 1,2>, and also for <1, 1,0,4>. 

6.9 Supposing that a ~ 0, show that 


Ed 
€ a 
is invertible if and only if ad — bc # 0 by reducing the matrix to echelon form. 


6.10 Let abe anm X n matrix, and let u be the nonsingular matrix that row reduces 
a,so that r = u- ais the row-reduced echelon matrix obtained from a, Suppose that r 
has m — & > O zero rows at the bottom (the kth row being nonzero), Show that the 
bottom m — & rows of u span the annihilator (range A)° of the range of A, That is, 
y = ax for some x if and only if a 
oe cy; = 0 
1 


for each m-tuple e in the bottom m — & rows of u. [Hint: The bottom row of r is 
obtained by applying the bottom row of u to the columns of a.] 


6.11 Remember that we find the row-reducing matrix u by applying to the m X m 
identity matrix e the row operations that reduce a to r. That is, we row reduce the 
m X (n-+ m) juxtaposition matrix ale to r|u. Assuming the result stated in the 
above exercise, find the range of A € Hom(IR) as the null space of a functional if the 
matrix of A is 


1 2 38 
2 3 4 
3 5 7 
6.12 Similarly, find the range of A if the matrix of A is 
1 1 1 
2 0 1 
0 2 1]° 
4 2 3 


6.13 Let a be an m X n matrix, and let a be row reduced to x. Let A and # be the 
corresponding operators in Hom(R”, R™) [so that A(x} = a- x]. Show that A and R 
have the same null space and that A* and R* have the same range space. 


6.14 Show that solving a system of m linear equations in x unknowns is equivalent 
to solving a matrix equation 

k = tx 
for the »-tuple x, given the m X » matrix t and the m-tuplek. Let T € Hom(R*, R”) 
be multiplication by t. Review the possibilities for a solution from our general linear 
theory for T (range, null space, affine subspace). 


2.7 THE DIAGONALIZATION OF A QUADRATIC FORM 111 


6.15 Let b = e{d be the mx (n+ p) matrix obtained by juxtaposing the m X n 
matrix e and the m  p matrix d. If a is an ?X m matrix, show that 


a+b = ae| ad. 


State the similar result concerning the expression of b as the juxtaposition of x column 
m-tuples, State the corresponding theorem for the “distributivity” of right multipli- 
cation over juxtaposition. 

6.16 Letabeanm X mn matrix and k a column m-tuple. Let b | Ibe the m & (n+ 4) 
matrix obtained from the m X (+ 1) juxtaposition matrix a|k by row reduction. 
Show that a-x = k if and only if b- x = 1. Show that there is a solution x if and only 
if every row that is zero in b is zero in 1. Restate this condition in terms of the notion 
of row rank. 

6.17 Let b be the row-reduced echelon matrix obtained from an m * ” matrix a. 
Thus b = u- a, where wis nonsingular, and B and A have the same null space (where 
B € Hom(R", R™) is multiplication by b). We can read off from b a basis for a sub- 
space WC R* such that B | W is an isomorphism onto range B, What is this basis? 
We then know that the null space N of B isa complement of W. One complement of W, 
eall it M, can be read off from W. What is M? 

6.18 Continuing the above exercise, show that for each standard basis vector 6;in Af 
we can read off from the matrix b a vector a; in W such that 6 — a; € N. Show that 
these vectors {8* — a;) form a basis for MV. 

6.19 We still have to show that the modified elementary row operations leave the 
determinant of a square matrix unchanged, assuming the properties (a) through (e) 
from Section 5. First, show from (a), (e), (d), and (e) that if 7 in Hom R? is defined 
by T(8!) = 8? and 7(8?) = —é!, then A(T) = 1. Do this by a very simple factor- 
ization, J = Ro S, where (e) can be applied to S. Conclude that a type (1’) elementary 
matrix has determinant 1. 

6.20 Show from the determinant property (b) that an elementary matrix of type (2) 
has determinant 1. Show, therefore, that the modified elementary row operations on a 
square matrix leave its determinant unchanged. 


*7. THE DIAGONALIZATION OF A QUADRATIC FORM 


As we mentioned earlicr, one of the crucial problems of linear algebra is the 
analysis of the “structure” of a lincar transformation T im Hom V. From the 
point of view of bases, every theorem in this area asserts that with the choice 
of a special basis for V the matrix of J’ can be given the such-and-such simple 
form. This is a very difficult part of the subject, and we are only making con- 
tact with it in this book, although Theorem 5.5 of Chapter 1 and its corollary 
form a cornerstone of the structural results. 

In this section we are going to solve a simpler problem. In the above lan- 
guage it is the problem of choosing a basis for V making simple the matrix of a 
transformation T in Hom(Y, V*). Such a transformation is equivalent to a 
bilinear functional on V (by Theorem 6.1 of Chapter 1 and Theorem 3.2 of this 
chapter); we shall tackle the problem in this setting. 


112 FINITE-DIMENSIONAL VECTOK SPACES 2.7 


Let V be a finite-dimensional real vector space, and let w#: V x V — R be 
a bilinear functional. If {a:}] ts a basis for V, then w determines a matrix 
biz; = wlaz, oj). We know that if w,(£) = w(£, 9), then wo, € V* and 47 w, is a 
linear mapping T from V to V*. We leave it as an exercise for the reader to 
show that {#;;} is the matrix of T with respect to the basis {a,} for V and its 
dual basis for V* (Exercise 4.1). 

If = Yj sa; and » = D7 y;a;, then 


w(t, 4) = 2 asyywla,, a) = Do bigniy;. 
J if 


In particular, if we set ¢(g) = @(£, £), then ¢(f) = Lo:,; tijxix; is a homogeneous 
quadratic polynomial in the coordinates 2;. 

For the rest of this section we assume that w is symmetric: w(£, 7) = 
w(n, ¢). Then we can recover w from the quadratic form q¢ by 


aE + 0) — 9 = 9) | 
4 


w(é, n) ol 


as the reader can easily check. In particular, if the bilinear form w is not iden- 
tically zero, then there are vectors £ such that g(£) = w(é, &) = 0. 

What we want to do is to show that we can find a basis {a;:}] for V such that 
wai, aj) = Oif ¢ + j and w(a;, a:) has one of the three values 0, + 1. Borrow- 
ing from the standard usage of scalar product theory (see Chapter 5), we say 
that such a basis is orthonormal. Our proof that an orthonormal basis exists will 
be an induction on 2 = dim Y. If n = 1, then any nonzero vector 8 is a basis, 
and if w(8, 8) ~ 0, then we can choose a = 28 so that r*w(8, 8) = wla, a) = 
+1, the required value of 2 obviously being « = {w(8, 8)j—‘/?. In the general 
case, if w is the zero functional, then any basis will trivially be orthonormal, and 
we can therefore suppose that w is not identically 0. Then there exists a 8 such 
that w(8, 8) ~ 0, as we noted earlier, We set a, = x8, where x is chosen to 
make gla,) = wlan, an) = +1. The nonzero linear functional f(£) = w(£, an) 
has an (x — 1)-dimensional null space NV, and if we let w’ be the restriction of 
wtoN XN, then w’ has an orthonormal basis {a;}7_' by the inductive hypoth- 
esis. Also w(a;, an) = war, a:) = Oif? < n, because a; is in the null space of f. 
Therefore, {a;}] is an orthonormal basis for w, and we have reached our goal: 


Theorem 7.1. If @ is a symmetric bilinear functional on a finite-dimensional 
real yector space V, then V has an w-orthonormat basis. 


For an w-orthonormal basis the expansion w{ £, 7) = Lo ay,;(a:, a;) reduces to 


eS > agate: 


where q(a;) = +1 or 0. If we let V1 be the span of those basis vectors «; for 
which g(a;) = 1, and similarly for V_; and Vo, then we see that g(é) > 0 for 
every nonzero & in Vy, @(#) < 0 for every nonzero vector £ in V_,, and g=0 


2.7 THE PIAGONALIZATION GF A QUADRATIC FORM 113 


on Vo. Furthermore, V = V; ® V_; ® Vo, and the three subspaces are 
w-orthonormal to each other (which means that w(t, q) = 0 if 2 V, and 
7 © V_y, ete.). Finally, g(¢) < 0 for every ¢in V_i @ Vo. 

If we choose another orthonormal basis {8;} and let W,, W_1, and Wp be 
its corresponding subspaces, then W, may be different from V1, but their dimen- 
sions must be the same. For Wi (V_1 ® Vo) = {0}, since any nonzero & 
in this intersection would yield the contradictory inequalities g(f) > 0 and 
q(t) < 0. Thus W, ean be extended to a complement of V_,; ® Vo, and since 
V, zs a complement, we have d(W,) < d(¥,). Similarly, d(V1) < dW), 
and the dimensions therefore are equal. Incidentally, this shows that W, is a 
complement of V_; @ Vo. In exactly the same way, we find that d(W_,) = 
d(V_,) and finally, by subtraction, that d(W») = d(Vo). It is conventional to 
reorder an @-orthonorma] basis {a,}4 so that all the «,’s with g(@;) = 1 come first, 
then those with g(a;) = —J, and finally those with g(@;) = 0. Our results 
above can then be stated as follows: 


Theorem 7.2. If is a symmetric bilinear functional on a finite-dimensional 
space Y, then there are integers n and p such that if {e;} is any w-ortho- 
normal basis in conventional order, and if £ = }C7' aie, then 


q(é) = ef +++ a2 — e344 —--- — aan 
Pp 2 pin 2 
=p a= Di 
I p+l1 


The integer p — n is called the signature of the form g (or its associated 
symmetric bilinear functional w), and p + x is its rank. Note that p + 7 is the 
dimension of the column space of the above matrix of g, and hence equals the 
dimension of the range of the related linear map 7’. Therefore, p + n is the 
rank of every matrix of ¢. 

An inductive proof that an orthonormal basis exists doesn’t show us how to 
find one in practice. Let us suppose that we have the matrix {#;;} of w with 
respect to some basis {a;}} before us, so that (£2) = D0 riyjti;, where 
f= Vixen a = LY yen, and 4; = w(a;, a;), and we want to know how to go 
about actually finding an orthonormal basis {6;}}. The main problem is to find 
an orthegonal basis; normalization is then trivial. The first objective is to find 
a vector 8 such that w(8, 8) # 0. If some &; = w(a:, a;) is not zero, we can take 
6 =a; If all ¢;; = 0 and the form w is not the zero form, there must be some 
liz ¥ 0, say fyg ~ 0. If we set ¥,) = a, + ag and ¥; = a; for? > 1, then {7} 
is a basis, and the matrix s = {s,;} of w with respect to the basis {7;} has 


$11 = W(Y1, V1) = wley + ag, ay + ag) = t1, + 2tyz + tez = 212 ¥ 0. 


Similarly, s;; = f,; if either z or 7 is greater than 1. 
For example, if w is the bilinear form on R? defined by w(x, y) = aye + 
x2¥1, then its matrix tj; = w(é', 6’) is 


114 FINITE-DIMENSIONAL VECTOR SPACES 2.7 
and we must change the basis to get £;,; ~ 0, According to the above scheme, 


we set ¥; = 6'-+ 6? and y,= 8 and get the new matrix s;; = w(Y;, Y;), 
which works out to 

? 1 

1 0 


The next step is to find a basis for the null space of the functional w(£, ¥;) = 


~ 2381;. We do this by modifying Y2,...,¥%_; we replace ¥; by Y; + c¥, and 
calculate ¢ so that this vector is in the null space. Therefore, we want 0 = 
w(¥; + eV), ¥1) = 81; + ¢811, and soc = —8,;/81,. Note that we cannot take 


this orthogonalizing step until we have made s,;, ~ 0. The new set still spans 
and thus is a basis, and the new matrix {r;;} has 71; ~ O and r;; = r;,; = 0 for 
j > 1. We now simply repeat the whole procedure for the restriction of to this 
(n — 1)-dimensional null space, with matrix {r;;:2 <i,7 <n}, and so on. 
This is a long process, but until we normalize, it consists only of rational oper- 
ations on the original matrix. We add, subtract, multiply, and divide, but we 
do not have to find roots of polynomial equations. 

Continuing our above example, we set 81 = ¥1, but we have to replace Y 
by Bo = Vo — ($12/81)¥1 = Yo — 3%1. The final matrix 1,; = w(6;, 8;) 
has 


fi = $14, = 2, fio = w(81, 82) = w(%1, Y2 — 471) = sig — 4811 = 0, 


and reg = wl¥2 — $41, Yo — $71) = 822 —- 812 + 81)/4 = —4, 80 that 


w= 2 9) 


The final basisis 6, = ¥1 = 6'+ 6? and @. = Yo — $7, = & — 4(81+ 8) = 
(8? — 61)/2. 

The steps we had to take above are reminiscent of row reduction, but since 
we are changing bases simultaneously in the domain and range spaces of the 
transformation 7: V — V* associated with w, each step involves simultaneously 
premultiplying and postmultiplying by an elementary matrix. That is, we are 
simultaneously row and column reducing. It should be intuitively clear that this 
has to be the case if we are to operate on a symmetric matrix in such a way as 
to keep it symmetric. 

For additional information about quadratic forms, we go back to the change 
of basis formula for the matrix of a transformation: v”’ = b- t’- a~!. Here the 
transformation T associated with the form is from V to V*, and sob = (a*)—", 
according to our caleulations in Section 4. Now one of the properties of the 
determinant function is that A(fP*) = A(T), and so A(a*) = Ata). Therefore, 
if t and s are the matrices of a quadratic form with respect to a first and second 
basis in V, and if a is the change of basis matrix, then s = (a*)7)- t- a7! and 
A(s) = (A(a7!))? A@. Therefore, a quadratic form has parity. If it is non- 
singular, then its determinant is either always positive or always negative, and 


27 THE DIAGONALIZATION OF A QUADRATIC FORM 115 


we can call it even or odd. In our continuing example, the beginning and final 


matrices 
o 1 aa 2 | 
1 0 0 — 


both have determinant —1. 

In the two-dimensional case, the determinant of a form with respect to an 
orthonormalized basis is +1 if the diagonal elements are both +1 or both —1, 
and —1 if they are of opposite sign. We can therefore read off the signature of a 
nonsingular form over a two-dimensional space without orthonormalizing. Tf the 
determinant t;,f22 — (12)” is positive, the signature is +2, and we can deter- 
mine which by looking at ¢;, (since ¢;; is then unchanged by our orthogonalizing 
procedure). Thus the signature is +2 or —2 depending on whether é,; > 0 or 
ti, <0. If the determinant is negative, then the signature is 0, Thus the 
signature of the form w(x, y) = r1y¥2 + rey,, with matrix 


tol 


is known to be 0, without any calculation. 

Theorems 7.1 and 7.2 are important for the classification of critica] points 
of real-valued functions on vector spaces. We shall sce in Section 3.16 that the 
second differential of such a function F is a symmetric bilinear functional, and 
that the signature of its form has the same significance in determining the be- 
havior of F near a point at which its first differential is zero that the sign of 
the second derivative has in the clementary calculus. 

A quadratic form q is said to be definete if g(£) is never zero except for § = 0. 
Then ¢(£) must always have the same sign, and gq is accordingly called posttive 
definite or negative definite. Looking back to Theorem 7.2, it should be obvious 
that g is positive definite only if p = d{V) and n = 0, and negative definite 
only if nm = d(V) and p = 0. Asymmetric bilinear functional whose associated 
quadratic form is positive definite is called a scalar product. This is a very 
important notion on general vector spaces, and the whole of Chapter 5 is de- 
voted to developing some of its implications. 


CHAPTER 3 


THE DIFFERENTIAL CALCULUS 


Our algebraic background is now adequate [for the differential calculus, but we 
still need some multidimensional limit theory. Roughly speaking, the differ- 
ential calculus is the theory of linear approximations to nonlinear mappings, 
and we have to know what we mean by approximation in general vector settings. 
We shall therefore start this chapter by studying the notion of a measure of 
length, called a norm, for the vectors in a vector space V. We can then study 
the phenomenon suggested by the way in which a tangent plane to a surface 
approximates the surface near the point of tangency. This is the general theory 
of unique local linear approximations of mappings, called differentials. The 
collection of rules for computing differentials includes all the familiar laws of 
the differential calculus, and achieves the same goal of allowing complicated 
calculations to be performed in a routine way. However, the theory is richer 
in the multidimensional setting, and one new aspect which we must master is 
the interplay between the linear transformations which are differentials and their 
evaluations at given vectors, which are directional derivatives in general and 
partial derivatives when the vectors belong to a basis. In particular, when the 
spaces in question are finite-dimensional and are replaced by Cartesian spaces 
through a choice of bases, then the differentia] is entirely equivalent to its matrix, 
which is a certain matrix of partial derivatives called the Jacobian matrix of the 
mapping. Then the rules of the differential calculus are expressed in terms of 
matrix operations. 

Maximum and minimum points of real-valued functions are found exactly 
as before, by computing the differential and setting it equal to zero. However, 
we shall neglect this subject, except in starred sections. It also is much richer 
than its one-variable counterpart, and in certain infinite-dimensional situations 
it beeomes the subject called the caleulus of variations. 

Finally, we shall begin our study of the inverse-mapping theorem and the 
implicit-funetion theorem. The inverse-mapping theorem states that if a mapping 
between vector spaces is continuously differentiable, and if its differential at a 
point a is invertible {as a linear transformation), then the mapping itself is 
invertible in the neighborhood of a. The implicit-function theorem states that if 
a continuously differentiable vector-valued function G of two vector variables 
is set equal to zero, and if the second partial differential of G is invertible (as a 
linear mapping) at a point <a,8> where G(a, 8) = 0, then the equation 

116 


3.1 REVIEW iN R 117 


G(£m) = 0 can be solved for » in terms of £ near this point. That is, there is a 
uniquely determined mapping 7 = /(£) defined near a such that 6 = F(a) and 
such that @G(t, F(£)} = 0 in the neighborhood of a. These two theorems are 
fundamental to the further development of analysis. They are deeper results 
than our work up to this point in that they depend on a special property of 
vector spaces called completeness; we shall have to put off part of their proofs to 
the next chapter, where we shall study completeness in a fairly systemutie way. 

In a number of starred sections at the end of the chapter we present some 
harder material that we do not expect the reader to master. However, he should 
try to get a rough idea of what is going on. 


1. REVIEW IN R 


Lvery student of the calculus is presumed to be familiar with the properties of 
the real number system and the theory of limits. But we shall need more than 
familiarity at this point. It will be absolutely essential that the student under- 
stand the ¢-definitions and be able to work with them. 

To be on the safe side, we shall review some of this material in the setting of 
limits of funetions; the confident reader can skip it. We suppose that all the 
functions we consider are defined at least on an open interval containing a, 
except possibly at @ itself. The need for this exception is shown by the difference 
quotients of the calculus, which are not defined at the point near which their 
behavior is crucial. 


Definition. f(x) approaches / as x approaches @ (in symbols, f(x) — / as 
x — a) if for every positive ¢ there exists a positive & such that 


O<iz—-al <3 = |fm—-I<e 


We also say that 7? is the limit of f(t) as x approaches a and write 
limya f(z} = l. The displayed statement in the definition is understood to be 
universally quantified in x, so that the definition really begins with the three 
quantifiers (Ve"°)(36°°)(Vz). These prefixing quantifiers make the definition 
sound artificial and unidiomatic when read as 
ordinary prose, but the reader will remember from 
our introductory discussion of quantification that 
this artificiality is absolutely necessary in order 
for the meaning of the sentence to be clear and 
unambiguous. <Any change in the order of the 
quantifiers (¥e)(38)(¥x) changes the meaning of the 
statement, 

The meaning of the inner universal quantifi- 
cation 


(va)}(0 < |x —al < 6 = [fiz) — iL < &) 
is intuitive and easily pictured (see Fig. 3.1). Fig. 3.2 


| 
I 
t 
I 


118 THE DIFFERENTIAL CALCULUS 3.1 


For all 2 closer to @ than 6 the value of f at x is closer to / than ¢. The 
definition begins by stating that such a positive 6 can be found for each positive e. 
Of course, 6 will vary with €; if € is made smaller, we will generally have to 
go closer to a, that is, we will have to take 6 smaller, before all the values of f 
on {a — 6,a+ 8) — {a} become € close to 7. 

The variables ‘e’ and ‘8’ are almost always restricted to positive real num- 
bers, and from now on we shall let this restriction be implicit unless there seems 
to be some special call for explicitness. Thus we shall write simply (Ve)(46) ... 

The definition of convergence is used in various ways. In the simplest 
situations we are given one or more functions having limits at a, say, f(x} 3 u 
and g(x} — v, and we want to prove that some other function A has a limit w 
at a. In such cases we always try to find an inequality expressing the quantity we 
wish to make small, |h(x) -- wl, in terms of the quantities which we know can be 
made small, |f(z) — uf and |g(x) -- vI. 

For example, suppose that h = f+. Since f(x) is close to % and g(x) is 
close to v, clearly A(z) is close tow = « + v. But how close? Since A(z) — w = 
(f(z) — u) 1 (g(x) — v), we have 


h(x) — wl < [f(@) — ul + lg) — a. 


From this it is clear that in order to make |h{x) — wl] Jess than € it is sufficient 
to make each of [f(«) — uw] and [g(z) — v] less than ¢€/2. Therefore, given any e, 
we can take 6; so that 0 < 'x — al < 6 => |f(x) — ul < €/2, and 8» so that 
0 < |x — a] < do = !g(xz) — 2| < €/2, and we can then take 6 as the smaller 
of these two numbers, so that if 0 < 'x — a] < 6, then beth inequalities hold. 
Thus 


0 <|e—al < 8 = |hfz) — we < f(z) — ul + [efx) — 0! <§+ 


ea 
ee. 
and we have found the desired $ for the function h. 

Suppose next that wu # Qand that A = 1//. Clearly, A(z} is close to w = I/u 
when f(x) is close to u, and so we try to express k(z) — w in terms of f(r) — zw. 


Thus 

1 1_¥—fe) 
fa «Qa 
and so |A(x) — wl] < [f(z) — ul/|f(x)ul|. The trouble here is that the denomi- 
nator is variable, and if it should happen to be very small, it might cancel the 
smallness of [f(z} — u{ and not force a small quotient. But the answer to this 
problem is easy. Since f(x) is close to wand zu is not zero, f(x) cannot be close to 
zero. For instance, if f(x) is closer to w than |u|/2, then f(x) must be farther 
from 0 than |wz//2. We therefore choose 5, so that 0 < |r — al < 6; => 
f(z) — u| < ju|/2, from which it follows that |f(x)| > Jul/2. Then 


(A(z) — w] < 21 fle) — ul/fel?, 


A(x) — w= 


3.1 REVIEW IN R 119 


and now, given any €, we take 52 so that 
0 < |x —al < 52 => |f(x) — ul < elul?/2. 


Again taking é as the smaller of 6, and 52, so that both inequalities will hold 
simultaneously when 0 < [x — al < 6, we have 


0 < |[z — al < 6 = |A(z) — wl < Alf(z) — ul/u|? < ele? /2|ul? = «, 


and again we have found our 4 for the funetion h. 
We have tried to show how one would think ahout these situations. The 
actual proof that would be written down would only show the choice of 5. Thus, 


Lemma 1.1. If f(z) — u and g(z) ~ vas x — a, then f(x) + gz) oO utp 
asx— a. 


Proof. Given €, choose 6, so that 0 < [x —al < 6: = (f(z) — al < €/2 
(by the assumed convergence of f to u at a), and, similarly, choose 62 so that 
0 < |z — al < 52 = |g(z) — o| < €/2. Take 4 as the smaller of 6; and és. 
Then 


0 < |x — al < 6 = |(f(@) 4+ gz) — (ut v)I 
< |f@z) — ul + lex) — ol < €/2+ 6/2 =. 


Thus we have proved that for every € there is a 6 such that 
0 < |x —al < 6 = |(f@) + of) — ut <e, 
and we are done. [J 


In addition to understanding ¢-techniques in limit theory, it is necessary to 
understand and to be able to use the fundamental property of the real number 
system called the least upper bound property. In the following statement of the 
property the semi-infinite interval (— 09, a] is of course the subset {x GR :x < a}. 


If A is a nonempty subset of R such that A C (—, a] for some a, then 
there exists a uniquely determined smallest number } such that A C (— 0, d}. 


A number a such that A C {—09, a] is called an upper bound of A; clearly, a 
is an upper bound of A if and only if every z in A is less than or equal to a4. 
A set having an upper bound is said to be bounded above. The property says that 
a nonempty set A which is bounded above has a least upper bound {lub). If 
we reverse the order relation by multiplying everything by —1, then we have the 
alternative formulation which asserts that a nonempty subset of R that is 
bounded below has a greatest lower bound (glb). The least upper bound of the 
interval (0, 1) is 1. The least upper bound of [0, 1} is also 1. The greatest lower 
hound of {1/n:n a positive integer} is 0. Furthermore, lub {x : x is a positive 
rational number and x? < 2} = +/2, glb {e7:2 eR} = 0, and lub {e7 : z is 
rutional and x < +/2} = e%. 


120 THE DIFFMRENTIAL CALCULUS 3.1 
EXERCISES 


1.1 Prove that if f(z) > 2 and f(z) — mas x — a, then! = m, We can therefore 
talk about the limit of f as 2 — a. 

12 Prove that if f(z) +l and g{z) — mm (as a — a), then f(x) giz) — Dan asz me. 

1.3 Prove that |x — a| < fal/2= |2| > |al/2. 

1.4 Prove (in detail) the greatest lower bound property from the least upper bound 
property. 

15 Show that lub A = x if and only if ¢ is an upper bound of .t and, for cvery 
positive e, a — ¢is not an upper bound of -1, 


16 Let -A and & be subsets of R that are nonempty and bounded! above, Show that 
A -|- Bis nonempty and bounded above and that lub (1+ B) = lub A -}- lub B. 


1.7 Formulate and prove a correct theorem about the least upper bound of the 
product of two sets. 

1.8 Define the notion of a one-sided limit for 4 function whose domain is 4 subset of R. 
For example, we want to be able to discuss the limit of f(z) as x approaches a from 
below, which we might designate 


Tas 


1.9 If the domain of a real-valued function f is an interval, say (a, 5], we say that f is 
an tnereasing function if 
c<y = f(z) S fy). 


Prove that an increasing function has one-sided limits everywhere. 


1.10 Let [a, 4] be a closed interval in R, and let f: [2, ) > R be increasing. Show that 
limnz.., f(z) = f(y) for all y in [a, 5) (f is continuous on [a, 8)) if and only if the range 
of f does not omit any subinterval (c, d) C (f(a), f()). [Hint: Suppose the range omits 
{c,d}, and set y = lub {x: f(z) < c}. Then f(x) + ffy) as « > y] 

111 <A set that intersects every open subinterval of an interval (s, ¢] is said to be 
dense in [s, t]. Show that if f: [a, 6] ~ R is increasing and range f is dense in [f(a), f(8)], 
then range f = [f(a), /()]. (For any z between f(a) and f(b) set y = lub {x: f(z) < 2}, 
etc.) 

1.12 Assuming the results of the above two exercises, show that if f is a continuous 
strictly increasing function from (a, 6] to R, and ifr = f(a) ands = f(b), then f—+ is a 
continuous strictly increasing function from [r,s] to R. [A function f is continuous if 
f(x) — fly) as z > y for every y in its domain; it is strictly increasing if « < ¥= 
Siz) < fH. 

1.18 Argue somewhat as in Exercise 1.11 above to prove that if f:[¢, b] — R is con- 
tinuous on [a, 6], then the range of f includes [f(a), f(d)]. This is the tintermediate- 
value theorem. 

1.14 Suppose the function ¢:R—R satisfies g(ty) = g(x) g(y) for all 2, yER. 
Note that g(z) = 2" (n a positive integer) and q(x) = Jal? (ry any real number) satisfy 
this “funetional equation”. So does g(x) = O(r = —~ 2), Show that if g satisfies the 
functional equation and g(z) > 1 for x > 1, then there is a real number r > 1 such 
that g(<) = x” for all positive x. 


3.2 NORMS 121 


1.15 Show that if ¢ is continuous and satisfies the functional equation gfzy) = 
4(%) e(y) for all z, y E R, and if there is at least one point a where g(a) ~ 0, 1, then 
q(x) = 2’ for all positive z. Conclude that if also ¢ is nonnegative, then ¢(z) = 'z|" on R, 
1.16 Show that if g(z} = fz|*, and if gz + y) < g(a) + o(y), thenr < 1. (Tryy = 1 
and x large; what is q’(x) like if r > 1°) 


2. NORMS 


In the limit theory of R, as reviewed briefly above, the absolute-value function 
is used prominently in expressions like ‘[z — y|’ to designate the distance 
between two numbers, here between x and y. The definition of the convergence 
of f(z) to wis simply a carcful statement of what it means to say that the distance 
\{(z) — ul tends to zero as the distance |x — a] tends to zero. The properties of 
|::| which we have used in our proofs are 

1) |z| > Oifz #0, and Ol = 0; 

2) |eyl = [el [yl 

3) |e+ yl < |e] + Ty. 

The limit theory of vector spaces is studied in terms of functions called 
norms, which serve as multidimensional analogues of the absolute-value function 
on R, Thus, if p: ¥V — Risa norm, then we want to interpret p(a) as the “size” 
of a and pla — &) as the “distance” between a and 8. However, if V is not 
one-dimensional, there is no one notion of size that is most natural. For example, 
if f is a positive continuous function on [a, 6], and if we ask the reader for a 
number which could be used as a measure of how “large” f is, there are two 
possibilities that will probably oceur to him: the maximum value of f and the area 
under the graph of f. Certainly, f must be considered small if max f is small. 
But also, we would have to agree that f is small in a different sense if its area is 
sinall. These are two examples of norms on the vector space V = @{(a, 5]) of all 
continuous functions on fa, 5]: 


pf) = max {fOl:t a,b} and — aff) = f [sla 


Note that f ean be small in the second sense and not in the first. 
In order to be useful, a notion of size for a vector must have properties 
nualogous to those of the absolute-value function on R. 
Definition. A norm isa real-valued function p on a vector space V such that 
nl. pla) > Oifa ¥ 0 (positivity); 
n2, p(ta) = |z|p(a) for alle € V, c € R (homogeneity); 
n3. pla + 8) < p(a) -+ p(s) for all a, 8 € V (triangle inequality), 


A normed linear space (nls), or normed vector space, is a vector space V 
together with a norm p on V. A normed linear space is thus really a pair 


122 THE DIFFERENTIAL CALCULUS 3.2 


<V,p>, but generally we speak simply of the normed linear space V, a definite 
norm on V then being understood. 

It has been customary to designate the norm of @ by |ja||, presumably to 
suggest the analogy with absolute valuc. The triangle inequality n3 then 
becomes ||a — &|| < |e]! + i], which is almost identical in form with the basic 
absolute-value inequality |2 + y| < |2|-+ |y]. Similarly, n2 becomes ||zel] = 
[z| |le|l, analogous to |zy] = |z| |y! in R. Furthermore, |lae — 8|| is similarly 
interpreted as the distance between a and 8. This is reasonable since if we set 
a= £—y7 and § = 7 — f, then n3 becomes the usual triangle inequality of 
geometry: 


lé — gil sé — all + le — SI. 


We shall use both the double bar notation and the “p”-notation for norms; each 
jis on occasion superior to the other. 

The most commonly used norms on R” are ||x||) = 7 |z,[, the Euclidean 
norm ||x||> = (2? 2?)""?, and ||x|l. = max {|z;};. Similar norms on the 
infinite-dimensional vector space €([a, 6]} of all continuous real-valued functions 
on [a, b] are 


lvl = f° sol a, 


1/2 
ifla=(fuOr a), 
[fle = max {\/(Q| sa << 0}. 


It should be easy for the reader to check that {| {], is a norm in both cases 
above, and we shall take up the so-called uniform norms || |,.. in the next 
paragraph. The Euclidean norms || ||2 are trickier; their properties depend on 
scalar product considerations. These will be discussed in Chapter 5. Meanwhile, 
so that the reader can use the Euclidean norm || ||2 on R*, we shall ask him to 
prove the triangle inequality for it (the other axioms being obvious) by brute 
foree in an exercise. On R itself the absolute value is a norm, and it is the only 
norm to within a constant multiple. 

We can transfer the above norms on R” to arbitrary finite-dimensional 
spaces by the following general remark. 


It 


Lemma 2.1. If p isa norm on a vector space W and T is an injective linear 
map from a vector space V to W, then p o 7 is 4 norm on ¥,. 


Proof. The proof is left to the reader. 


Uniform norms. The two norms || ||... considered above are special cases of a 
very general situation. Let A be an arbitrary nonempty sct, and let @(A, R) 
be the set of all bounded functions f: A — R. That is, f = @(A, R) if and only if 
f<R* and range f C [—5, 6] for some 6 E R. This is the same as saying that 
range |f] C [0, 6], and we call any such b a bound of |f]. The set @(A, R) isa 


3.2 NORMS 123 


vector space V, since if |f| and |g| are bounded by 6 and e, respectively, then 
laf + yg| is bounded by |2[b + [yle. The uniform norm |f||. is defined as the 
smallest bound of |f|. That. is, 


Hfll~ = lub {lf(p)| :p & A}. 


Of course, it has to be checked that || ||. isa norm. For any p in A, 
lf(p) + a€p)l < if(e)| + le(p)| < Mflle + lige 


Thus ||f\l.. + gil. is a bound of |f + g| and is therefore greater than or equal to 
the smallest such bound, which is |f + g/l... This gives the triangle inequality. 
Next we note that if z = 0, then b bounds || if and only if |z|b bounds |z/|, and 
it follows that |lzfll. = |z| ||file. Finally, [ffl]. 2 0, and ||fl|. = 0 only if f is 
the zero function. 

We can replace R by any normed lincar space W in the above discussion. 
A function f: A — W is bounded by 6 if and only if ||f(p)|| < 5 for all p in A, 
and we define the corresponding uniform norm on ®(A, W) by 


ILfl 2 = lub {|lf(p) I: » € A}. 


If f € €((0, 1)), then we know that the continuous function f assumes the 
least upper bound of its range asa value (that is, f “assumes its maximum value”), 
so that then || jl. is the maximum value of |f|. In general, however, the definition 
must be given in terms of lub. 


Balls. Remembering that ||a — £|| is interpreted as the distance from a to , it is 
natural to define the open bail of radius r about the center a as {£: lla — £| <7}. 
We designate this ball B,(a). Translation through 8 preserves distance, 


7 s(t) — Ta(a)]] = [(€ +6) — (9 +8] = fle — all, 


and therefore § € B,(«) if and only if $+ 6 € B,(a + 8). That is, translation 
through @ carries B,(a) into B-(a + 8): TalB-{a)] = Be(a+ 8). Also, scalar 
multiplication by ¢ multiplies all distances by ¢, and it follows in a similar way 
that cB,(a) = B.,{ca). 

Although B,{«) behaves like a ball, the actua! set being defined is different 
for different norms, and some of them “look unspherelike”. The unit balls about 
the origin in R? for the three norms |] |];, || ]|z, and [f []. are shown in Fig. 3.2. 

A subset A of a nis V is bounded if it lies in some ball, say B,{a). Then it 
also lies in a ball about the origin, namely B,+ai(0). This is simply the fact that 
if || — all < 7, then |[é] < + + |la|], which we get from the triangle inequality 
upon rewriting '|£[! as ||(¢ — «) + all. 

The radius of the largest ball about a vector 8 which does not touch a set A 
is naturally called the distance from 8 to A. It is clearly glb {| — Al| : & € A} 
{see Fig. 3.3). 


124 THE DIFFERENTIAL CALCULUS 3.2 


a(B, -L)=r 
Fig. 3.2 Fig. 3.3 Fig. 3.4 


A point a is an interior point of a set A if some bail about a is included in A. 
This is equivalent to saying that the distance from a to the complement of A is 
positive (supposing that A is not the whole of V), and should coincide with 
the reader’s intuitive notion of what an “inside” point should be. A subset A 
of a normed linear space is said to be open if every point of A is an interior 
point. 

If our language is to be consistent, an open ball should be an open set. It is: 
if « € B,(B), then lla — al] < 7, and then Bs(a) c B,(f), provided that 6 < r -- 
la — gl, by virtue of the triangle inequality (see Fig. 3.4). The reader should 
write down the detailed proof. He has to show that if £ € Bj(a), then ¢ & B,(8). 
Our intuitions about distances are quite trustworthy, but they should always be 
checked by a computation. The reader probably can sec by a mental argument 
that the union of any collection of open sets is open. In particular, the union of 
any collection of open balls is open (Fig. 3.5}, and this is probably the most 
intuitive way of visualizing an open set. (Sec Exercise 2.9.) 


ir, 


Fig. 3.5 Fig. 3.6 


A subset C is said to be closed if its complement C” is open. 

Our discussion above shows that a nonempty set C' is closed if and only if 
every point not in it is at a positive distance from it: a € C => p{a,C) > 0. 
The so-called closed bali of radius r about 8, B == {&:||t ~ 6l| < r}, is a closed 
set. As Tig. 3.6 suggests, the proof is another application of the triangle in- 
equality. 


3.2 NORMS 125 
EXERCISES 


2.1 Show that if ||& — al] < jle||/2, then ||él| > llal|/2. 
2.2 Prove in detail that 


n 
Wella = 20 lea 
1 
is a norm on R*. Also prove that 


lls = f° gol ae 


is a norm on (a, 5)). 
2.3 For x in R* let |z| be the Muelicean jength 


n= [Eat] 
and let (x, y) be the scalar product 

(x, y) = 3 EiYse 
The Schwarz inequality says that ; 

I(x, y) < [x] [yl 


and that the incquality is strict if x and y are independent. 
a) Prove the Schwarz inequality for the case » = 2 by squaring and canceling. 
b) Now prove it for the general n in the same way. 

2.4 Continuing the above exercise, prove that the Euclidean length |x| is a norm. 
The crucial step is the triangle inequality, [x + y] < |x| + ly|. Reduce it to the 
Schwarz inequality by squaring and canceling. This is of course our two-norm j|x]l2. 

2.5 Prove that the unit balls for the norms || [li and { ||. on R? are as shown in 
Fig. 3.2. 

2.6 Prove that an open ball is an open set. 

2.7 Prove that a closed ball is a closed set. 

2.8 Give an example of a subset of R? that is neither open nor closed. 

2.9 Show from the definition of an open set that any open set is the union of a 
family (perhaps very large!) of open balls. Show that any union of open sets is open. 
Conclude, therefore, that a set is open if and only if it is a union of open balls. 

2.10 A subset A of a normed linear space V is said to be convex if A includes the line 
segment joining any two of its points. We know that the line segment from a to 8 is 
the image of [0, 1} under the mapping ¢ — ¢8-+ (1 — Ha. Thus A is convex if and 
only if a, 8 € A and ¢ € (0, 1] (8+ A — Ya E A. Prove that every ball B(¥) in 
a normed linear space V is convex. 

2.11 A seminorm is the same as a norm except that the positivity condition nl is 
relaxed to nonnegativity: 

nl’, p(x) > 0 for all a. 


126 THE DIFFERENTIAL CALCULUS 3.3 


Thus pla) may be 0 for some nonzero a. Every norm is in particular a seminorm, 
Prove: 
a} If pis a seminorm on a vector space W and T isa linear mapping from V to W, 
then peo Tis a seminorm on V. 
b) po Tis a norm if and only if Tf is injective and p is a norm on range 7. 
2.12 Show that the sum of two seminorms is a seminorm. 
2.13 Prove from the above two exercises (and net by a direct calculation) that 


af) = IF" lle + 1feo)| 


is a seminorm on the space @'{[a, 6]) of all continuously differentiable real-valued 
functions on [a, dj], where éo is a fixed point in [a, b). Prove that ¢ is a norm. 

2.14 Show that the sum of two bounded sets is bounded. 

2.15 Prove that the sum B,(a) -+ B,(8) is exactly the ball B,4s(a@ + 8). 


3. CONTINUITY 


Let V and W be any two normed linear spaces. We shall designate both norms 
by || ||. This ambiguous usage does not cause confusion. It is like the ambiguous 
use of “0” for the zero elements of all the vector spaces under consideration. If we 
replace the absolute value sign | | by the general norm symbol |} || in the 
definition we gave earlier for the limit of a real-valued function of a real variable, 
it becomes verbatim the corresponding definition of convergence in the general 
setting. However, we shall repeat the definition and take the occasion to relax 
the hypothesis on the domain of f. Accordingly, let A by any subset of V, and let 
f be any mapping from A to W. 


Definition. We say that f(#) approaches @ as = approaches a, and write 
f(§— B as &— a, if for every € there is a 6 such that 


t€Aand0 < |lt—all <5 = If) — Bll <e 


lf ae A and f(5) — f(a) as t > a, then we say that f is continuous ai a. 
We can then drop the requirement that  ~ a and have the direct ¢,é- 
characterization of continuity: f is continuous at @ if for every € there exists a 6 
such that || — all < 6 => ||f(£}) — f(a)|| < «. It is understood here that ¢ is 
universally quantified over the domain A of f. We say that f is continuous if f is 
continuous at every point a in its domain. If the absolute value of a number is 
replaced by the norm of a vector, the limit theorems that we sampled in Section 1 
hold verbatim for normed linear spaces. We shall ask the reader to write out a 
few of these transcriptions in the exercises. 

There is a property stronger than continuity at a which is much simpler to 
use when. it is available. We say that f is Lipschitz continuous at a if there is a 
constant ¢ such that |[f(¢) — f(a)|| < ¢l[/—E — alf for all ¢ sufficiently close to «. 


3.3 CONTINUITY 127 


That is, there are constants ¢ and r such that 
lé — all <r = [IfC) — fle)! < ellé — all. 


The point is that now we can take 6 simply as €/e (provided ¢€ is small enough so 
that this makes 5 < r; otherwise we have to set § = min {e/c, r}). We say 
that f is a Lipschitz function (on its domain A) if there is a constant ¢ such that 
4f(€) — f(n)|| < ecllé — q]| for all ¢,y in A. Fora linear map T:V — W the 
Lipschitz inequality is more simply written as 


ITI < ellgll 


for all ¢ € V; we just use the fact that now T(£) — T(n) = T(E — 4) and set 
¢ = £ — ». In this context it is conventional to call T a bounded linear mapping 
rather than a Lipschitz linear mapping, and any such c is called a bound of T. 

We know from the beginning calculus that if f is a continuous real-valued 
function on [a, ] (that is, if f  @((a, b])), then [2 f(z) dz| < m(b — a), where 
mis the maximum value of |f(x)|. But this is just the uniform norm of f, so that 
the inequality can be rewritten as (f? f] < (6 — a)llfl«- This shows that if the 
uniform norm is used on @({a, J), then fro ft J is a bounded linear functional, 
with bound b — a. 

It should immediately be pointed out that this is not the same notion of 
boundedness we discussed earlier. There we called a real-valued function 
bounded if its range was a bounded subset of R. The analogue here would be to 
call a vector-valued function bounded if its range is norm bounded. But a 
nonzero linear transformation cannot be bounded in this sense, because 


|P(aa)|| = |x| |7()I]- 


The present definition amounts to the boundedness in the earlier sense of the 
quotient T(a)/||e|| (on VY — {0}). It turns out that for a linear map T, being 
continuous and being Lipschitz are the same thing. 


Theorem 3.1. Let T bea linear mapping from a normed linear space V to a 
normed linear space W. Then the following conditions are equivalent: 


t) T is continuous at one point; 

2) T is continuous; 

3) T is bounded. 
Proof. (1) = (3). Suppose 7 is continuous at ag. Then, taking ¢ = 1, there 
exists 6 such that |la — ag|| < § = ||T(a) — Tlao)|| < 1. Setting t = a — ag 
and using the additivity of T, we have ||t] < 6 = |I7(4)|| < 1. Now for any 


nonzero 9, & = $y/2\|y|| has norm 6/2. Therefore, |[7(£}|| <1. But 
ps 8|7(}|]/2l[a||, giving ||7(4)[| < 2ll9|//8. Thus 7 is bounded by 
= 2/8. 


128 THE DIFFERENTIAL CALCULUS 3.3 


(3) = (2). Suppose |j7'(s}{| < Cll] for all & Then for any a and any € 
we can take 6 = e€/C and have 
lla — ag] < 8 = |[P(a) — Teo)|| = [IT(@ — ap}l] < Clla — aol] < C=. 

(2) = (1). Trivial. 0 

In the lemma below we prove that the norm function is a Lipschitz function 
from V to R. 

Lemma 3.1. For all a, 6 € V, ! fel] — {l8i| | < lle — Bl. 


Proof. We have |jai| = ||(« — 8) + 8} < [ja — 4! + [[8||, 80 that jel] — [isl < 
lx — gl. Similarly, |S] — flail < ||@ — all = lle — Bi]. This pair of inequal- 
ities is equivalent to the lemma. U 

Other Lipschitz mappings will appear when we study mappings with con- 
tinuous differentials. Roughly speaking, the Lipschitz property lies between 
continuity and continuous differentiability, and it is frequently the condition 
that we actually apply under the hypothesis of continuous differentiability. 

The smallest bound of a, bounded linear transformation T' is called its norm. 
That is : 

Z|] = lub {||7(@)Ii/Mal] :@ ¥ 0}. 


lor example, let 1: @([a, b]) + R be the Riemann integral, T(f) = J? f(a) de. 
We saw earlier that if we use the uniform norm ||/l|.. on @([a, 5]), then T is 
bounded by b — a: |T(f}| < (6 — @)|[fllz. On the other hand, there is no smaller 
bound, because [21 = 6 — a= ( —a)(llil.. Thus [7] = 4— 4. Other 
formulations of the above definition are useful. Since 
Fa) ||/llel] = [7e/lle||)|| 

by homogencity, and since 8 = a/|fa|| has norm 1, we have 

|P'} = lub {]7(@)|| : [lel] = 1}. 
Finally, if |[yj] < 1, then y = 2@, where ||@|| = 1 and [z| < 1, and 

IF (7) || = lzf IF@ < FI. 
We therefore have an inefficient but still useful characterization: 

[|7"] = lub {}7(y)|| : vl] < 1. 
These last two formulations are uniform norms. Thus, if B, is the closed unit 


ball {¢: [[é|| < 1}, we sec that a linear 7 is bounded if and only if 7 [ B, is 
bounded in the old sense, and then 


ITI] = WP f Bille. 


A lincar map T: V — W is bounded below by b if ||T(£)|| > 6|| él forall €in V. 
If T has a bounded inverse and m = |[7'—'||, then 7 is bounded below by 1/m, 
for {7—1(»)|| < mllnl[ for all » € W if and only if [él] < m||7'(]] forall eV. 


3.3 CONTINUITY 129 


Tf ¥ is finite-dimensional, then it is true, conversely, that if T is bounded below, 
then it is invertible (why?), but in general this does not follow. 

If V and W are normed linear spaces, then Hom(V, W) is defined to be the 
set of all bounded linear maps T: V — W. The results of Section 2.3 all remain 
true, but require some additional arguments. 


Theorem 3.2, Hom(V, W) is itself a normed linear space if ||7'] is defined 
as above, as the smallest bound for 7. 


Proof. This follows from the uniform norm discussion of Section 2 by virtue 
of the identity |T| = ||T f Bille. £ 


Theorem 3.3. If U, V, and W are normed linear spaces, and if 
Te Hom(l’,V) and Se Hom(¥,W), then Se Te Hom(U, W) and 
|S o Fi] < [Sl] Z|]. It follows that composition on the right by a fixed T 
is a bounded linear transformation from Hom(V, W) to Hom(U, W), and 
similarly for composition on the left by a fixed S. 


Proof 
Se Tell = PSC7@))i] < WSIE ATC) < WSHCT | Heil) = CSI] - ITWClall)- 
Thus So T is bounded by [jS[l- |||; and everything else follows at once. 0 


As before, the conjugate space V* is Hom(V, 8), now the space of all bounded 
linear functionals. 


EXERCISES 


3.1 Write out the ¢,d-proofs of the following limit theorems. 

1) Let V and W be normed linear spaces, and let f and G be mappings from V to W. 
If lime, P(2) = pw and lim;.. G(’) = », then limgi. (PF 4+ G)(t) = wt». 

2} Given FP: V 4 W and g: V >R, if F(E) au and g(f) >) as £ > a, then 
(gF)(E) — bp. 

3.2 Prove that if F(t) — pas E> a@and Gin) >A as 7 — u, then Go F(E} + das 
&— a. Give a careful, complete statement of the theorem you have proved. 

3.3 Suppose that A is an open subset of a nls V and that aj © 4. Suppose that 
FP: A —> Ris such that lima oy Pia) = 6 # 0, Prove that 1/F{a) - 1/b as a > ag 
(¢,6-proof). 

3.4 The function f(x) = |z/"is continuous atx = Q for any positiver. Prove that fis 
not Lipschitz continuous at z = O ifr < I. Prove, however, that f is Lipschitz con- 
tinuous at z = aif a > 0. (Use the mean-value theorem.) 

3.5 Use the mean-value theorem of the calculus and the definition of the derivative 
to show that if f is a real-valued function on an interval J, and if f’ exists everywhere, 
then f is a Lipschitz mapping if and only if f’ is a bounded function. Show also that 
then |] f’]}. is the smallest Lipschitz constant C. 


130 THE DIFFERENTIAL CALCULUS 3.3 


3.6 The “working rules” for }7'|| are 
1) PCH S WPI él for alt §; 
2) ITS] < del, al = [TI < 6. 
Prove these rules. 


3.7 Prove that if we use the one-norm ||x|f1 = D7 |2:{ on R”, then the norm of the 
linear functional 


nr 
Laitx > ye aX; 
1 


is llalle. 
3.8 Prove similarly that if jx] = |xllx, then ||Zal} = llalli. 
3.9 L'se the above exercises to show that if ||x|] on R” is the one-norm, then 


iIxli = lub {]f@)| sf © (R*)* and [f} S 1. 


3.10 Show that if T in Hom(R"*, R”) has matrix t = {t;;}, and if we use the one- 
norm ||x||1 on R* and the uniform norm |ly}],.. on R®, then ||7{} = ||tll.- 

3.11 Show that the meaning of ‘Hom(¥, IV)’ has changed by giving an example of a 
linear mapping that fails to be bounded. There is one in the text. 

3.12 Fora fixed in V define the mapping ev;: Hom(I’, #) — W byev.(T) = T(£). 
Prove that ev; is a bounded linear mapping. 

3.13 In the above exercise it is in fact true that flevg] = [El], but to prove this we 
need a new theorem. 


Theorem. Given é in the normed linear space V, there exists a functional f in ¥* 
such that |/f{]| = 1 and |f(&| = |/é|l. 


Assuming this theorem, prove that |lev;|| = ||£||. [Hiné: Presumably you have already 
shown that jlev;|| < jEll. You now need a 7 in Hom(¥, I) such that ||7] = 1 and 
|7( || = ||é|;. Consider a suitable dyad.} 

3.14 Lett = {t,,} be a square matrix, and define ||t|| as max, (303 |fy|). Prove that 
this is a norm on the space R®** of all 2 < n matrices. Prove that ||st{] < ||s'] - [ltl]. 
Compute the norm of the identity matrix. 

3.15 Let VY be the normed linear space R” under the uniform norm |[x||4 = max {|z,)}. 
If T € Hom YV, prove that |{7'l| is the norm of its matrix |}t!] as defined in the above 
exercise. That is, show that 


|7\) = max [> tal. 
t j- 


(Show first that |[t]] is an upper bound of 7, and then show that ||7'(x)|]| = |lell [fx] for 
a specially chosen x.) Does part of the previous exercise now become superfluous? 


3.16 Assume the following fact: If f € @¢(0, 1]) and |lfll1 = @, then given ¢, there is a 
funetion g € @((0, 1]) such that 


\ 
llglleo = 1 and di fo >a—e. 
0 


3.3 CONTINUITY 131 


Let K(s, £) be continuous on (0, 1) X [0, 1] and bounded by 8. Define 7: @([0, 1}} = 
BO, 1) by TR = &, where 


i 
k(s) = f K(s, A(t) dt. 


If V and Il are the normal linear spaces C and @& under the uniform norms, prove that 


7] = tub [ |K(s, 8) dl. 


({fint: Proceed as in the above exercise.] 

3.17 Let V and TV be normed linear spaces, and let -L be any subset of V containing 
more than one point. Let £01, HV) be the set of all Lipsehitz mappings from 1 ta WW. 
For fin £¢.1, W), let p(f) be the smallest Lipschitz constant for f. That is, 


POS eal 


Prove that £(.1, IV) is a vector space V and that p is a seminorm on V. 


3.18 Continuing the above exercise, show that if « is any fixed point of .1, then 
(ff) + [lf(e)|| is a norm on FV, 

3.19 Let A be a mapping from a subset .1 of a normed linear space V to V which 
differs from the identity by a Lipschitz mapping with constant ¢ less than 1. We may 
as well take ¢ = 4, and then our hypothesis is that 


WK(é} — K(n) — (€ — aS alle - all. 


l’rove that A is injective and that its inverse is a Lipschitz mapping with constant 2. 


4.20 Continuing the above exercise, suppose in addition that the domain .1 of A is 
an open subset of V and that A[C} is a closed set whenever C is a closed ball lying in -1. 
l'rove that if C = €y(a@), the closed ball of radius r about a, is a subset of 4, then 
AJC] includes the ball B = B,,;7(¥), where Y = K(a}. This proof is elementary but 
tricky. If there is a point v of B not in K(C], then since A[C] is closed, there is a largest 
ltl B’ about vu disjoint from A[C] and a point » = K(£) in K[C} as close to B’ as we 
wish. Now if we change & by adding v — 9, the change in the value of & will approxi- 
mate v — 4 closely enough to force the new value of K to be in B’. If we can also show 
that the new value &+ (v — 9) isin C, then this new value of & is in A[C], and we 
lave our contradiction. 

Draw a picture. Obviously, the radius p of B’ is at most 7/7. Show that if 
4 = K(£) is chosen so that ||u — y|| < 3/2, then the above assertions follow from the 
triangle inequality, and the Lipschitz equality displayed in Exercise 3.19. You have 
11) prove that 

|A(E+ (v—9}) — ull < p 

wnetl 


N(E+ @ — 9)) —all <r. 
$3.21 Assume the result of the above exercise and show that 


Byy{¥) C KLB,(e)]. 


132 THE DIFFERENTIAL CALCULUS 3.4 


Show, therefore, that A[.{] is an open subset of 1’. State a theorem about the Lipschitz 
invertibility of A, including all the hypotheses on AK that wore used in the above 
exercises. 

3.22 We shall see in the next chapter that if Y and W are finile-dimensional spaces, 
then any continuous map from V to W takes bounded closed sets into bounded closed 
sets, Assuming this and the results of the above exercises, prove the following theorem. 


Theorem. Jet * be a mapping from an open subset .i of a finite-dimensional 
normed linear space V¥ to a finite-dimensional normed linear space HW. Suppose 
that there is a 7 in Hom()’, 1’) such that 7? exists and such that # — T is 
Lipschitz on -, with constant 1,/2m, where m = |/7—!|}. Then # is injective, its 
range R = F[-1} ts an open subset of H’, and its inverse #—* is Lipschitz contin- 
uous, With constant 2ne. 


4, EQUIVALENT NORMS 


Two normed linear spaces V and W are norm zsomorphic if there is a bijection T 
from V to W such that T € Hom(V, W) and 7—! © Hom(W, V). That is, an 
isomorphism is a linear isomorphism 7 such that both T and T7' are continuous 
(bounded). As usual, we regard isomorphic spaces as being essentially the same. 
For two different norms on the same space we are led to the following definition. 


Definition. Two norms p and g on the same vector space V are equivalent 
if there exist constants a and 6 such that p < agand g < bp. 


Then (1/)¢ <p < ag and (1/a)p < ¢ < bp, so that two norms are 
equivalent if and only if either can be bracketed by two multiples of the other. 
The above definition simply says that the identity map ¢- £ from V to V, 
considcred as a map from the normed linear space <V, p> to the normed 
linear space <V,g>, is bounded in both directions, and hence that these two 
normed linear spaces are isomorphic. 

If V is infinite-dimensional, two norms will in general not be equivalent. 
For example, if V = e(f0, 1]) and f,(Q = ¢”, then [[fpl]y = I/(e-+ 1) and 
fn {lo = 1. Therefore, there is no coustant a such that |[f\|. < ell fll, for all 
f € €0, 1], and the norms |f ||. and || ||; are not equivalent on V = @[a, 3]. 
This is why the very notion of a normed linear space depends on the assumption 
of a given norm. 

However, we have the following Lheorem, which we shall prove in the next 
chapter by more sophisticated methods than we are using at present. 


Theorem 4.1. On a finite-dimensional vector space V all norms are equiva- 
lent. 


We shall need this theorem and also the following consequence of it occasion- 
ally in the present chapter. 


Theorem 4.2. If V and W are finite-dimensiona]l normed linear spaces, then 
every linear mapping 7 from V to W is necessarily bounded. 


3.4 EQUIVALENT NORMS 133 
Proof. Because of the above theorem, it is sufficient to prove TJ bounded with 


respect to some pair of norms. Let @: R™ > V and ¢: R”~— W be any basis 
isomorphisms, and let {t,;} be the matrix of T = ¢—!o To @ in Hom(R”, R®™). 


Then a 
D tye] < omarttsl) (3 los]) = oleh 
j as 


[Tx = max 


where 6 = max [f;;|. Now q(n) = le7' (alle and p(é) = ||@7"(2)i|, are norms 
on W and V respectively, by Lerma 2.1, and since 


a(T(8)) = [TO Hll» < dlle7* all, = bp(), 
we see that T is bounded by b with respect to the norms p and g on V and W. U 


If we change to an equivalent norm, we are merely passing through an 
isomorphism, and all continuous linear properties remain unchanged. Tor 
example: 


Theorem 4,3, The vector space Hom(V, W) remains the same if either the 
domain norm or the range norm is replaced by an equivalent norm, and the 
two induced norms on Hom(¥, W) are equivalent. 


Proof. The proof is left to the reader. 


We now ask what kind of a norm we might want on the Cartesian product 
V x W of two normed linear spaces. It is natural to try to choose the product 
norm so that the fundamental mappings relating the product space to the two 
factor spaces, the two projections w; and the two injections @;, should be con- 
tinuous. It turns out that these requirements determine the product norm 
uniquely to within equivalence. For if || <a, &>|{ has these properties, then 


}<a, &>]| = |<a,0> + <0, €>{] < ]<a,0>|| + || <0, &> ii 
< ky lel] + kell él] S kChell + WED, 


where #; is a bound of the injection @; and & is the larger of &, and ky. Also, 
lal] < e:||<a, >|} and {ll < ce||<a, &>||, by the boundedness of the projec- 
tions 7;, and so |le|| + jl él] < e¢l|<a, &> |], where ¢ = ¢,-+ ¢z. Now |lall + 
[él] is clearly a norm |j ||, on V X W, and our argument above shows that 
ll<a, >|] will satisfy our requirements if and only if it is equivalent to || |[1. 
Any such norm will be called a product norm for V x W. The product norms 
inost frequently used are the uniform (product) norm 


| <a, E> |loo = max {llall, elt, 


the Euclidean (product) norm ||<a, €>||2 = (led]? + [[é||7)1/%, and the above 
sum (product) norm ||<a, €>||;. We shall leave the verification that the uni- 
form and Euclidean norms actually are norms as exercises. 

Each of these three product norms can be defined as well for factor spaces 
as for two, and we gather the facts for this general case into a theorem. 


134 THE DIFFERENTIAL CALCULUS 3.4 


Theorem 4.4. If {< V3, o:>}7 is a finite set of normed linear spaces, then 
| Ih, I lla, and I Il-0; defined on V = imi V; by llee|l = a pila), 
ijatle = (I pi(ai)?) ve, and |lell. = max {pi(ai)i¢ = 1,...,}, are 
equivalent norms on V, and cach is a product vorm in the scnse that the 
projections 7; and the injections @; are all continuous, 


*Tt looks above as though all we are doing is taking avy norm || || on R* and 
then defining a norm || {| on the product space V by 


llelll = I<pifen), --- 1 Palon) >| 


This is almost correct. The interested reader will discover, however, that 
|| || on BR” must have the property that if [e;) < fy, for ¢= 1,..., 2, then 
Ixif < jlyl| for the triangle inequality to follow for ||] {| in V. If we call such a 
norm on R” an zxereastng norm, then the following is truc. 


If || || is any inereasing norm on R*, then |e] = || < py(ey),-.., palan) > || 
is @ product norm on V = [Jj Vi. 


However, we shall use only the 1-, 2-, «-product norms in this book. « 


The triangle inequality, the continuity of addition, and our requirements on 
a product norm form a set of nearly equivalent conditions. In particular, we 
make the following observation. 


Lemma 4.1. If ¥ is a normed linear space, then the operation of addition 
is @ bounded linear map from V x FV to V. 


Proof. The triangle inequality for the norm on V says exactly that addition is 
bounded by 1 when the sum norm is used on V x V. 0 


A normed linear space V is a (norm) direet sum ©] ¥; if the mapping 
<21,..+,2,> > 7 x, isa norm isomorphism from ]]7 ¥; to V. That is, the 
given norm on V must be equivalent to the product norm it aequircs when it is 
viewed as J]7 V,. If V is algebraically the direct sum Qi Vi, we always have 
nn | n 
p> i = Dll 


el — | 


by the triangle inequality for the norm on V, and the sum on the right is the one- 
norm for J]} V;. Therefore, ¥ will be the norm direct sum (D7 V; if, conversely, 
there is an 7-tuple of constants {k;} such that [lx,|| < &,||x|| for all 2. This is 
the same as saying that the projections P;: x 2; are all bounded. Thus, 


Theorem 4,5. If Vis a normed lincar space and V is algebraically the direct 
sun V = yj Vi, then V = C—Dj Vi; as normed linear spaces if and only if 
the associated projections {P;! are all bounded. 


3.4 EQUIVALENT NORMS 135 


EXERCISES 


4.1 The fact that Hom(V, W) is unchanged when norms are replaced by equivalent 
norms can be viewed as a corollary of Theorem 3.3. Show that this is so. 


4.2 Write down a string of quite obvious inequalities showing that the norms 
il lla; I) fle, and |f [lo on R* sre equivalent. Discuss what happens as n — ~, 


4.3 Let V be an 7-dimensional vector space, and consider the collection of all norms 
on ¥ of the form po 6, where 6: VY — IR” is a coordinate isomorphism and p is one of 
the norms || {l1, [I |lz, il le on RR”. Show that all of these norms are equivalent, (Use 
the above exercise and the reasoning in Theorem 4.2.) 


4,4 Prove that ||<a, £>|] = max {|lcll, |||} isa norm on VX TV. 
4.5 Prove that ||<a, =>|| = Jel] + [El] is a norm on ¥ X WW. 
4.6 Prove that ||<a, &>|] = (llal]? + jEl|2)! is a norm on ¥ & WW, 


4.7 Assuming Exercises 4.4 through 4.6, prove by induction the corresponding part 
of Theorem 4.4. 


4.8 Prove that if 4 is an open subset of V X W, shen [A] is an open subset of V. 
4.9 Prove (e, 6) that <T,S> 3 So T is a continuous map from 


Hom(¥1, V2) Xk Hom(V2, ¥3) to Hom(¥1, ¥3), 


where the V,; are all normed linear spaces. 
4.16 Let || |] be any increasing norm on R"*; that is, |x|] < |ly{] if a; < y, for all 7, 
let p; be a norm on the vector space V; for? = 1,.,.,”. Show that 
I ll = |< puto), ..., paloen) >| 
ina norm on V = J]j ¥;. 


1.11 Suppose that »: V > R is a nonnegative function such that p(za) = [x|p(a) 
for all x, a. This is surely a minimum requirement for any function purporting to be a 
measure of length of a vector, 


a) Define continuity with respect to p and show that Theorem 3.1 is valid. 
b) Our next requirement is that addition be continuous as a map from ¥ X V to ¥, 
and we decide that continuity at 0 means that for every ¢ there is a 6 such that 


pla} < band p(e) < 6 => plat) <e. 
Argue again as in Theorem 3.1 to show that there is a constant ¢ such that 
pla + 8B) < e(pla) + p(s)) forall a, BEV. 


4.12 Let V and IW be normed linear spaces, and let f: ¥ X IF — R be bounded and 
bilinear. Let T be the corresponding linear map from V to I¥*. Prove that 7 is bounded 
and that ||?'|| is the smallest bound to /, that is, the smallest 6 such that 


|fla, B)| < dllel| Bll for all a, 8. 


4.13 Let the normed linear space V be a norm direct sum Jf ®@ N. Prove that the 
aubspaces Mf and WN are closed sets in V. (‘The converse theorem is false.) 


136 THE DIFFERENTIAL CALCULUS 3.5 


4.14 Let N be a closed subspace of the normed linear space V. If A isa coset V + a, 
define [J -1 | as glb {Jé: €€ 1}. Prove that |l| Alf isa norm on the quotient space V/N. 
Prove also that if £ is the coset containing &, then the mapping ~> = (the natural 
projection + of V onto V/N) is bounded by 1. 

4.15 Let ¥ and I be normed linear spaces, and let 7 in Hom(V, IJ") have a null space 
which ingludes the closed subspace N. Prove that the unique linear 8 from V/N to IV 
defined by T = Sow (Theorem 4.3 of Chapter 1) is bounded and that j[Si] = ||7ll- 
4.16 Let MV be a closed subspace of a normed linear space, and suppose that N has a 
finite-dimensional complement in Lhe purely algebraic sense. Prove that then V is the 
norm direct sum J @ N. (Use the above exercise and Theorem 4.2 to prove that if P 
is the projection of V onto N along Jf, then P is bounded.) 

4.17 Lei N, and Ne be closed subspaces of the normed linear space V, and suppose 
that they have the same finite codimension. Prove that ¥, and Az are norm isomor- 
phic. (Assume the results of the above exercise and Exercise 2.11 of Chapter 2.) 

4.18 Prove that if p isa seminorm on a vector space V, then its null set is a subspace 
N, p is constant on the cosets of N, and p factors: p = g¢ a, where gis a norm on V/N 
and x is the natural projection > £ of V onto V/N. Note that ++ € is thus an 
isometrie surjection from the seminormed space V to the normed space V/N. An 
isometry is a distance-preserving map. 


5. INFINFITESIMALS 


The notion of an tnjfinztestmal was abused in the early literature of the calculus, 
its treatment generally amounting to logical nonsense, and the term fell into 
such disrepute that many modern books avoid it completely. Nevertheless, it 
is a very useful idea, and we shall base our development of the differential upon 
the properties of two special classes of infinitesimals which we shall eall ‘big oh’”’ 
and “‘little oh” (and designate ‘0’ and ‘o’, respeetively). 

Originaily an infinitesimal was considered to be a number that “is infinitely 
small but not zero”. Of course, there is no such number. Later, an infinitesimal 
was considered to be a variable that approaches zero as its limit. However, we 
know that it is functions that have limits, and a variable can be considered to 
have a limit only if it is somehow considered to be a function, We end up looking 
at functions ¢ such that ¢(?) > 0asi— 0. The definition of derivative involves 
several such infinitesimals. If f’(r} exists and has the value a, then the funda- 
mental difference quotient (f(x + 2) — f(z))/é is the quotient of two infinites- 
imals, and, furthermore, ((f(z + ) — f(e})/t)} — aalso approaches 0 as ¢ > 0. 
This last funetion is not defined at 0, but we can get around this if we wish by 
multiplying through by ¢, obtaining 


(f+ — flz)) ~ a = od, 


where f(x + ¢) — fix) is the “change in f” infinitesimal, ad is a linear infinitesimal, 
and ¢(?) is an infinitesimal that approaches 0) faster than t (i.e., o(f)/t — Dast — 0). 
If we divide the last equation by £ again, we see that this property of the infin- 


3.5 INFINITESIMALS 137 


itesimal ¢, that it converges to 0 faster than tas ¢ — 0, is exactly equivalent to 
the fact that the difference quotient of f converges to a. This makes it clear that 
the study of derivatives is included in the study of the rate at which infinites- 
imals get small, and the usefulness of this paraphrase will shortly become clear. 


Definition. A subset A of a normed linear space V is a neighborhood of a 
point a if A includes some open ball about a. A deleted neighborhood of aisa 
neighborhood of a minus the point a itself. 


We define special sects of functions 9, ©, and o as follows. It will be assumed 
in these definitions that each function is from a, neighborhood of 0 in a normed 
linear space V to a normed linear space W. 


feos if f(0) = 0 and f is continuous at 0. These functions are the infi- 
nitesimals. 


feo if f(®) = 0 and f is Lipschitz continuous at 0. That is, there exist 
positive constants r and ¢ such that || f(£)|| < ellé[ on B-(O). 


J Eo if (0) = 0 and [[f(II/|é] + 0 as § > 0. 


When the spaces V and W are not understood, we specify them by writing 
ov, W), ete. 

A simple set of functions from R to R makes the qualitative difference 
between these classes apparent. The function f(z) = |2|}/? is in g(R, R) but not 
in 0, g(x) = x is in © and therefore in ¢ but not in e, and A(x) = 2? is in all 
\hree classes (Fig. 3.7). 


Fig. 3.7 


It is clear that $, 8, and e are unchanged when the norms on V and W are 
tplaced by equivalent norms. 

Our previous notion of the sum of two functions does not apply to a pair 
of functions f, g € 3(V, W) because their domains may be different. However, 
+ |. g is defined on the intersection dom f 4 dom g, which is still a neighborhood 
nl Q. Moreover, addition remains commutative and associative when extended 
in this way. It is clear that then 3(V, W) is almost a vector space. The only 
trouble occurs in connection with the equation f + (—f) = 0; the domain of 
the funetion on the left is dom f, whereas we naturally take 0 to be the zero 
function on the whole of V. 


138 THE DIFFERENTIAL CALCULUS 3.5 


*The way out of this difficulty is to identify two functions f and g in g if 
they are the same on some ball about 0. We define f and g to be eguivalent 
(f ~ g) if and only if there exists a neighborhood of 0 on which f = g. We then 
check (in our minds) that this ¢s an equivalence relation and that we now do 
have a vector space. Its elements are called germs of functions ai 0. Strictly 
speaking, a germ is thus an equivalence class of functions, but in practice one 
tends to think of germs in terms of their representing functions, only keeping in 
mind that two functions are the same as germs when they agree on a neighbor- 
hood of 0. 


As one might guess from our introductory discussion, the algebraic prop- 
erties of the three classes J, ©, and o are crucial for the differential calculus. 
We gather them together in the following theorem. 


Theorem 5,1 


1) of ¥, W) c o(V, W) Ca(¥, W), and each of the three classes is closed 
under addition and multiplication by scalars. 


2) If feov,W), and if geoW,X), then gofeay, X), where 
dom ¢ ¢ f = f—!{dom g]. 

3) If either f or g above is in 0, then so is g ¢ f,. 

4) If feo(V, W) and ge s(V,R), then fy Ea(V, W), and similarly if 
fesgandged, 


5) In (4) if cither f or g is in e and the other is merely bounded on a neigh- 
borhood of 0, then fg € of V, W). 


6) Hom(V, W) c ay, W). 

7) Hom(V, W) nof{¥, W) = {0}. 
Proof. Let &.(V, W) be the set of infinitesimals f such that |[f(£)|| < el]é|| on 
some ball about 0. Then f € 0 if and only if fis in some £,, and f € 0 if and only 
if f isin every £,. Obviously, oe COC. 
1) If f(€|| < allé|| on B,(0) and {[g(é)|} < 6]/£i| on B.(0), then 


#2 + 9(2)|] < (2+ 8)féll 


on 8,(0), where + = min {¢, «}. Thus © is closed under addition. The 
closure of o under addition follows similarly, or simply from the limit of 4 
sum being the sum of the limits. 


2) If |If(é)|| < aié] when [[él] < ¢ and [lg(m)|| < bl|y|| when ||y|| <u, then 


lo FO) S PIA || < adll ll 


when || é/] < ¢ and |[f(¢)]] < u, and so when j/#] <r = min {t, u/a}. 


3.5 INFINITESIMALS 139 


3) Now suppose that f Gein (2). Then, given ¢, we can takea = €/b and have 


lg FE < elf ell 


when [[é]] <r. Thus gefeo. The argument when g €o and fe dis 
essentially the same. 
4) Given ||f(£)|| < ejl[é|| on B-(O) and given €, we choose 6 such that |g(£)| < 
e/e on B;(0) and have 
IFO < ell ell 


when ||é|| < min (6,7). The other result follows similarly, as also does (5). 
6) A bounded linear transformation is in © by definition. 
7) Suppose that fe Hom(V, W)no(¥,W). Take any a +0. Given ¢, 
choose x so that || f(£)|, < ef/él| on B,(0). Then write a as a = ré, where 
lg] <r. (Find and x.) Then 


I f(e}l| = F@Oll = il - IPSCO S |al - €- 18 = elle. 


Thus ||f(@){ < éllel] for every positive €, and so f(z) = 0. Thus f = 0, 
proving (7). 0 


Remark. The additivity of f was not used in this argument, only its homogeneity. 
It follows therefore that there is no homogeneous function (of degree 1) in © 
except 0. 


Sometimes when more than one variable is present it is necessary to indicate 
with respect to which variable a function isin 0 or e. We then write “f(£) = 0(£)” 
for “f EO”, where “O(£)” is used to designate an arbitrary element of 0. 

The following rather curious Iemma will be useful later in our proof of the 
differentiability of an implicitly defined function. It is understood that y = f(£), 
where f is the function we are studying. 


Lemma 5.1. If 7 = O(£) + o(< £, 97>) and also 4 = 9(£), then 7» = O(E). 


Proof. The hypotheses imply that there are numbers 6, +; and p such that 
all < ifg|| + SC él] + ivi) if Eh < v1 and ||| + flal] <p, and then that 
tml] < p/2 if || £!] is smaller than some rg. If |j£|| < r = min {ry, re, p/2}, then 
ull the conditions are met and ||n|| < biz} + $(pel] + llall}. But this is the 
inequality [{y|] < (2b + 1)||é], and so 9 = 0(). O 


We shall also need the following straightforward result. 


Lemma 3.2. If f € 0(V, X) and g € (V, Y), then <f,g> E0(V, XX ¥). 
That is, <0(£), O(2)> = Of). 


Proof. The proof is left to the reader. 


140 THE DIFFERENTIAL CALCULUS 3.6 


EXERCISES 


3.1 Prove in detail that the class (V, HW’) is unchanged if the norms on ¥ and IV 
are replaced by equivalent norms. 

5.2 Do the same for 0 and o. 

5.3 Prove (5) of the Ov-theorem (‘Theorem 5.1). 

5.4 Prove also that if in (4) either f or g is in © and the other is merely bounded ona 
neighborhood of 0, then fg E ACY, W). 

3.5 Prove Lemma 5.2. (Remember that #? = </'), fo> is loose language for 
f= 8,0 F) + @0 Fo.) State the generalization to n funetions. State the e-form of 
the theorem. 

5.6 Given Fy € O(V:, W) and #2 € O(Va, W), define F from (a subset of) V = 
Vi X Vo to W by Flan, #2) = Fy(ai) + Poles). Prove that FE OV, WW). (First 
state the defining equation as an identity involving the projections m1 and we and not 
involving explicit mention of the dumain vectors a) and a.) 

5.7 Given #; € O(V1, W) and #2 € O(V 2, R), define precisely what you mean by 
FF. and show that it is in o(V) X Ve, W). 

5.8 Wefine the class 0" as follows: f © O* if f E $ and If s) || /||El|" is bounded in some 
deleted ball about 0. {A deleted neighborhood of e is a neighborhood minus a.) State 
and prove a theorem about f-+ g when fe 07 and g & 0”. 

5.9 State and prove a theorem about fo g when f € 0” and g € 0”. 

5.10 State and prove a theorem about fg when f € 07 and g EO”. 
3.11 Define a similar class oe”. Stale and prove a theorem about fog when f € 0" 
and g Go”, 


6. THE DIFFERENTIAL 


Before considering the notion of the differential, we shall review some geometric 
material from the elementary calculus. We do this for motivation only; our sub- 
sequent theory is independent of the preliminary diseussion. 

In the elementary one-variable caleulus the derivative f(a) of a function f 
at the point a has geometric meaning as the slope of the tangent line to the graph 
of fat the point a. (Of course, according to our notion of a function, the graph 
of f is f.) The tangent line thus has the (point-slope) equation y — f(a) = 
f'(a)(@ — a), and is the graph of the affine map x f’(a)(2 — a) + f(a). 

We ordinarily examine the nature of the curve f near the point <a, f(a) > 
by using new variables which are zero at this point. That is, we express every- 
thing in terms of s = y — f(a) and {= x— a. This change of variables is 
simply the translation <x1,y> re <is> = <z—a,y —f(a)> in the 
Cartesian plane R? which brings the point of interest <a, f(a) > to the origin. 
If we picture the situation in a Euclidean plane, of which the next page is a satis- 
factory local model, then this translation in R? is represented by a choice of new 
axes, the t- and s-axes, with origin at the point of tangency. Since y = f(x) 


3.6 THE DIFFERENTIAL 14k 


if and only if s = f(a-+ 8 — fla), we see that the image of f under this trans- 
lation is the function Af, defined by Af,(t) = f(a+ 0) — f(a). (See Fig. 3.8.) 
Of course, Af, is simply our old friend the change in f brought about by changing 
zfromatoa+ t. 


@flt) — A fall} =alt} 


Similarly, the equation y — f(a) = f’(a){x — a) becomes s = f’(a)i, and 
the tangent line accordingly translates to the line that is (the graph of) the 
linear functional ?: t+ f’(a)t having the number f’(a) as its skeleton (matrix). 
Remember that from the point of view of the geometric configuration (curve and 
tangent line) in the Euclidean plane, all that we are doing is choosing the natural 
axis system, with origin at the point of tangency. Then the curve is (the graph 
of) the function Af., and the tangent line is (the graph of) the linear map 7. 

Now it follows from the definition of f’(a) that / can also be characterized as 
the linear function that approximates Af, most closely. For, by definition, 


AfelO _, #7(a) as t— 0, 


und this is exactly the same as saying that 


Sfa(t) a i(t) => 


; or Af, — to. 


But we know from the Qe-theorem that the expression of the function Af, as the 
sum 2-+ ois unique. This unique linear approximation fis called the differential 
of fat a and is designated df.. Again, the differential of f at a is the linear function 
f;R-+R that approximates the actual change in f, Af,, in the sense that 
Af, — ¢ € 0; wesaw above that if the derivative f’(a) exists, then the differential 
of f at @ exists and has f’{a) as its skeleton (1 * 1 matrix). 

Similarly, if f is a function of two variables, then (the graph of) f is a surface 
in Cartesian 3-space R* = R? x R, and the tangent plane to this surface at 
<a, b, f(a, b} > has the equation z — f(a, 6) = f,(a, b)(a — a) + fo(a, b)(2 — 8), 


142 THE DIFFERENTIAL CALCULUS 3.6 


where f; = of/dx and fe = af/dy. If, as above, we set 
Af <a,s>(8, 2) = f(a + 8,b + f) — fla, b) 


and i(s, t) = sfy(a, b) + tfe(e, b), then Af<g.,> is the change in f around a, 6 
and [ is the linear functional on R? with matrix (skeleton) <fj(a, 6), fo(a, b) >. 
Moreover, it is a theorem of the standard calculus that if the partial derivatives 
of f are continuous, then again 7 approximates Af<e,s>, with error in 6. Here 
also / is ealled the differential of fat <a, b> and is designated df<.4> (Fig. 3.9). 
The notation in the figure has been changed to show the value at t = <ty, to> 
of the differential df, of fat a= <a), a>. 


The following definition should now be clear. As above, the local function 
AF, is defined by AF,(t) = F(a+ £) — F(a). 


Definition. Let V and W be normed linear spaces, and let A be a neighbor- 
hood of a in V. A mapping F: A — W is differentiable at a if there isa T 
in Hom(V, W) such that AF.(£) = T(t) + o($). 


The ©o-theorem implies then that 7' is uniquely determined, for if also AF. = 
S + 0, then 7’ — S Eo, and so T — S = 0 by (7) of the theorem. This uniquely 
determined T' is called the differential of F at « and is designated dF,. Thus 


AF, = dF, + 9, 


where dF, is the unique (bounded) linear approximation to AF 4. 


3.6 THE DIFFERENTIAL 143 


* Our preliminary discussion should make it clear that this definition of the 
differential agrees with standard usage when the domain space is R". However, 
in certain cases when the domain space is an infinite-dimensional function space, 
dF, is called the first variation of F at a. This is due to the fact that although the 
early writers on the calculus of variations saw its analogy with the differential 
calculus, they did not realize that it was the same subject. + 


We gather together in the next two theorems the familiar rules for differ- 
entiation. They follow immediately from the definition and the Goe-theorem. 

It will be convenient to use the notation D,(V, W) for the set of all mappings 
from neighborhoods of a in V tc W that are differentiable at a. 


Theorem 6.1 

1) iF €9,(V, W), then AF, € o(V, W). 

2) If Ff, G € D.(V,W), then F+G € D,(V,W) and dF + @. = 
dF, + dG. 

3) If F €D,(V, R) and G € D,{V, W), then FG € D,(V, W) and d(FG@), = 
F(a) dG, + dF.G(a), the second term being a dyad. 

4) If F is a constant function on V, then F is differentiable and d¥,, = 0. 

5) If F € Hom(V, W), then F is differentiable at every « € V and dF, = F. 


Proof 

1) AF, = af, +o = 6+0= 60 by (1) and (6) of the Oo-theorem. 

2) It is clear that A(F + G), = AF. + AG... Therefore, 4(F+ @. = 
(dF + 0) + (dG, +0) = (dF. + dG.) + ¢ by (1) of the Oc-theorem. Since 
dF, + dG, € Hom(Y, W), we have (2). 

3) A(FG),(£) = Fla + Gla + §) — Fle)Gla) 

= AF.(£)G(a) + F(a) AG, (£) + AF ,.(é) AG,(£), 
as the reader will see upon expanding and canceling. This is just the usual 
device of adding and subtracting middle terms in order to arrive at the form 
involving the A’s. Thus 


A(F@)y = (dF, + ©)G(a) + F(a)(dG, +0) + 60 = dF Gla) + F(a) dGy-+ 0 


by the ©e-theorem. 
4) HAF, = 0, then dF, = 0 by (7) of the Go-theorem. 
5) AF.(f) = F(a+ 8) — F(a) = F(é). Thus AF, = F € Hom(V, W). 0 


The composite-function rule is somewhat more complicated. 


Theorem 6.2. If FE Da(V, W) and GE De~@(W, X), then Go F ED, (¥, X) 
and 
ad(G ° PF), => dG P(e) ° aF,. 


144 THE DIFFERENTIAL CALCULUS 3.6 


Proof. We have 
A(G ° F)a(§) 


G(F(a + §) — G(F{a)) 

G(F(a) + AP,(£)) — G(F(a)) 

= AGria)(AFa(£)) 

dG Pay (AF a(t) + o(OF o(#)) 

dG pray (dFalt)) + dG ray (o(é)) +009 
(dG rq) 0 dF )(E) $Oecote Od. 


Thus A(Ge FP), = dGpa) eo dF, + 6, and since d@pia, ° dF, € Hom{V, W), 
this proves the theorem. The reader should be able to justify each step taken in 
this chain of equalities. U 


ll 


I 


I] 


EXERCISES 


6.1 The coordinate mapping <2, y> 2 from R? to R is differentiable. Why? 
What is its differential? 


6.2 Prove that differentiation commutes with the application of bounded linear 
maps. That is, show that if 7: V — W is differentiable at « and if T= Hom(W, X), 
then 7 o F is differentiable at a and d(T o F), = To di,. 


6.3 Prove that FE DV, R) and F(a) #03 G = 1/FE D.C, R) and 


—dF a 
(F(a)? 


6.4 Let F:V —R be differentiable at a, and let f: R > R be a function whose 
derivative exists at a = F(a). Prove that fo F is differentiable at a and that 


d(fo Pla = f(a) dPa. 


{Remember that the differential of f at 4 is simply multiplication by its derivative: 
dj,(h) = hf’(a).] Show that the preceding problem is a special case. 

6.5 Let V and HW" be normed linear spaces, and let #': V — W and G: W — V be 
continuous Maps such that Go F = fy and FoG = iy. Suppose that F is differ- 
entiable at a and that G is differentiable at @ = F(a). Prove that 


dG, = 


dGp = (dFa)71. 


6.6 Let f: ¥V — R be differentiable at a. Show that g = f* is differentiable at « and 
that 


aga = nfl (a) aa. 


(Prove this both by an induction on the product rule and by the composite-function 
rule, assuming in the second ease that D.r* = nx*—?,) 


3.6 THE DIFFERENTIAL 145 


6.7 Prove from the product rule by induction that if the » functions f;: ¥ > R, 
i= 1,..., 7, are all differentiable at a, then so is f = []7f;, and that 


df. = Lo pee) dfa 


n 
i=l 
6.4 A monomial of degree n on the normed linear space V is a product [Jf i: of 
linear functionals (; E V*). A homogeneous polynomial of degree n is a finite sum of 
monomials of degree n. A polynomial of degree n isa sum of homogeneous polynomials 
P;,t = 0,...,n, where Pp is a constant. Show from the above exercise and other 
known facts that a polynomial is differentiable everywhere. 
6.9 Show that if Fy: ¥ — Wy and Fe: ¥ — W are both differentiable at a, then 
sois F = <F,F2:> from V to W = 1X We (use the injections 6 and 6). 
6.10 Show without using explicit computations, but using the results of earlier 
exercises instead, that the mapping ? = R? — R? defined by 


<z,y> to < (a — 9), (2+ y)?> 


is everywhere differentiable. Now compute its differential at <a, b>. 
6.11 Let F:V > X and G: W — X be differentiable at a and 8 respectively, and 
define K: VX W— X by 

K(é, 2) = F(é) + GG). 


Show that K is differentiable at <a, 8 > 


&) by a direct 4-calculation; 
b) by using the projections 7; and wz to express K in terms of F and G without 
explicit reference to the variable, and then applying the differentiation rules. 


6.12 Now suppose given F: ¥ + R and G: W — X, and define K by 

KE, a) = P(E) GC). 
Show that if F and @ are differentiable at a and @ respectively, then X is differentiable 
at <a, > in the manner of (b) in the above exercise. 


6.13 Let V and W be normed linear spaces. Prove that the map <a, 8 > +> lal ||8'| 

from V X W to Risin o{V X W,R). Use the maximum norm on the product space. 
Let f: V X W — R be bounded and bilinear. Here boundedness means that there 

is some 8 such that [f(a, 8)| < blla| |||| for all a,8. Prove that f is differentiable 

everywhere and find its differential. 

6.14 Let f and g be differentiable functions from R to R. We know from the composite- 

function rule of the ordinary calculus that 


(f° 9)'(a) = $’(9(a))9’(a). 
Our composite-function rule says that 


a(f 2 9)a = Efocay ° aga, 


where df, is the linear mapping ¢ — f’(z)t. Show that these two statements are equiv- 
alent. 


146 THE DIFFERENTIAL CALCULUS 3.7 


6.15 Prove that f(z,y) = ||<2,y>|h1 = (2!+ [yj is differentiable except on the 
coordinate axes (that is, df<c,o> exists if a and 6 are both nonzero). 

6.16 Comparing the shapes of the unit balls for {| ||; and || ||. on R°, guess from the 
above the theorem about the differentiability of || ||... Prove it. 

6.17 Let ¥ and W be fixed normed linear spaces, let Xz be the set of all maps from 
V to W that are differentiable at 0, let Xo be the set of all maps from V to W that 
belong to o( ¥, W), and Jet X; be Hom(V, W). Prove that Xz and Xo are vector spaces 
and that X¥¢ = Xo ® X:. 

6.18 Let F bea Lipschitz function with constant C which is differentiable at 4 potnt a. 
Prove that |jdF.|| < C. 


7. DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 


Directional derivatives form the connecting link between diffcrentials and the 
derivatives of the elementary calculus, and, although they add one more concept 
that has to be fitted into the scheme of things, the reader should find them 
intuitively satisfying and technically useful. 

A continuous function f from an interval J C R to a normed linear space W 
can havea derivative f’(z) at a point  € 7 in exactly the sense of the elementary 
calculus: 


f'(e) = limf2 29 =F). 
t-+0 t 

The range of such a function f is a curve or are in W, and it is conventional to 
call f itself a parametrized arc when we want to keep this geometric notion in 
mind. We shall also call f’(z), if it exists, the tangent vector to the are f at zx. 
This terminology fits our geometric intuition, as Fig. 3.10 suggests. For sim- 
plicity we have set z = 0 and f(x) = 0. If f’(x) exists, we say that the param- 
etrized are f is smooth at xz. We also say that f is smooth at «a = f(z), but this 
terminology is ambiguous if f is not injective (i.e., if the arc crosses itself}. An 
are is smooth if it is smooth at every value of the parameter. 

We naturally wonder about the relationship between the existence of the 
tangent vector f’(x) and the differentiability of f at x. If df, exists, then, being a 
linear map on R, it is simply multiplication “by” the fixed vector @ that is its 
skeleton, df+(h) = h df-(1} = ha, and we expect a to be the tangent vector. 


vé 
a f 


fO+0-F0) 
0) = a a 


Fig. 3.10 


3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 147 


f'(z). We showed this and also the converse result for the ordinary calculus in 
our preliminary discussion in Section 6. Actually, our argument was valid for 
vector-valued functions, but we shall repeat it anyway. 

When we think of a vector-valued function of a real variable as being an 
are, we often use Greek letters like ‘X’ and ‘Y’ for the funetion, as we do below. 
This of course does not in any way change what is being proved, but is slightly 
suggestive of a geometric interpretation. 


Theorem 7.1. A parametrized arc ¥:{a, b] — V is differentiable at x € (a, b) 
if and only if the tangent vector (derivative) a = ¥’(x) exists, in which case 
the tangent vector is the skelcton of the differential, dy.(h) = AY’(z) = ha. 


Proof. If the parametrized are ¥:[@, 5] > V is differentiable at x € (a, 6), then 
ayz(h) = hdy,(1} = ha, where a = dv,(1). Sinee AY, — d¥z € 0, this gives 
|Av.(h} — hel|/]h| > 0, and so AY,(h)/h > wash — 0. Thus e@ is the derivative 
¥'(x) in the ordinary sense. By reversing the above steps we sce that the exis- 
tence of (x) implies the differentiability of Y at 2. U 


Now let F be a function from an open set A in a normed linear space V to a 
normed linear space W. One way to study the behavior of F in the neighborhood 
of a point a in A is to consider how it behaves on each straight line through a. 
That is, we study # by temporarily restricting it to a one-dimensional domain. 
The advantage gained in doing this is that the restricted F is then simply a 
parametrized are, and its differential is simply multiplication by its ordinary 
derivative. 

For any nonzero £ & Y the straight line through « in the direction £ has the 
parametric representation {+> a+ #& The restriction of F to this line is the 
parametrized arc Y: Y(t) = F(a + ¢&). Its tangent vector (derivative) at the 
origin é = 0, if it exists, is called the derivative of F in the direction £ at a, or the 
derivative of F with respect to = at a, and is designated D;F(a). Clearly, 


DF(o) = tim 2 = a = Fa) 
&30 


Comparing this with our original definition of f’, we see that the tangent vector 
y'(x) to a parametrized are ¥ is the directional derivative D,¥(x) with respect 
to the standard basis vector 1 in R. 

Strictly speaking, we are misusing the word “direction”, because different 
vectors ean have the same direction. Thus, if 7 = cf with ¢c > 0, then y and ¢ 
point in the same direction, but, because D;F'(q) is linear in & (as we shall see in a 
moment), their associated derivatives are different: D,F (a) = cD,F(a). 

We now want. to establish the relationship between directional derivatives, 
which are vectors, and differentials, which are linear maps. We saw above that 
fer an arc Y differentiability is equivalent to the existence of ¥’(z) = Dy,7¥(2). 
In the general case the relationship is not as simple as it is for arcs, but in one 
direction everything goes smoothly. 


148 THE DIFFERENTIAL CALCULUS 3.7 


Theorem 7,2. If F is differentiable at a, and if \is any smooth are through a, 
with « = A(x), then ¥ = Food is smooth at x, and Y’(x) = dF, (A‘(2)). 
In particular, if F is differentiable at a, then every directional derivative 
D;F (a) exists, and D-F(a) = dF. (£). 


Preof. The smoothness of ¥ is equivalent to its differentiability at x and there- 
fore follows from the composite-funection theorem. Moreover, ¥/(r) = dy,(1) = 
d(F © d),(1) = dF (dd,(1)) = aFa(\'(z)}. If » is the parametrized line 
\() = a+ té, then it has the constant derivative ~, and since a = \(0) here, 
the above formula becomes 7’(0) = dFP,(£). That is, DiF(a) = 7'(0) = 
dF ,(é). U 


It is not true, conversely, that the existence of all the directional derivatives 
D,F (a) of a function F at a point « implies the differentiability of F at a The 
easiest counterexample involves the notion of a homogenous function. We say 
that a function F: ¥V — W is homogeneous if F(xt) = xF(£) for all x and £. 
For such a function the directional derivative D,F (0) exists because the arc 
v(t) = F(O+ t&) = {F(é) is linear, and ¥'(0) = F(£). Thus, ail of the directional 
derivatives of a homogeneous function F exist at 0 and D.F(O) = F(é). If F is 
also differentiable at 0, then dF o(£) = D;F(0) = F(t) and F = dFo. Thus a 
differentiable homogeneous function must be linear. Therefore, any nonlinear 
homogeneous function F will be a function such that D;¥ (0) exists for all & but 
dF'g does not exist. Taking the simplest possible situation, define F: R? > R by 
P(x, y) = 2°/(2? + y*) if <2, y> * <0,0> and F(0,0) = 0. Then 


Fitz, ty) = iF (2, ¥), 


so that # is homogeneous, but F is not linear. 

However, if V is finite-dimensional, and if for each ~ in a spanning set of 
vectors the directional derivative D,F(a) exists and is a continuous function of a 
on an open set A, then F is continuously differentiable on A. The proof of this 
fact depends on the mean-value theorem, which we take up next, but we shall 
not complete it until Section 9 (Theorem 9.3). 

The reader will remember the mean-value theorem as a cornerstone of the 
calculus, and this is just as true in our general theory. We shall apply it in the 
next section to give the proof of the general form of the above-mentioned 
theorem, and practically all of our more advanced work will depend on it. The 
ordinary mean-value theorem does not have an exact analogue here. Instead we 
shall prove a theorem that in the one-variable calculus is an easy consequence of 
the mean-value theorem. 


Theorem 7.3. Let f be a continuous function (parametrized arc) from a 
closed interval [a, b] to a normed linear space, and suppose that f’(£) exists 
and that |[f’(O|| < m for all ¢€ (a,6). Then f(b) — f(a)|| < mb — a). 


Proof. Fix ¢ > 0, and let A be the set of points x € [a, b] such that 
f(x) — fla) S (m+ Ye —Ot+e 


3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 149 


A imeludes at least a small interval [e, c], because f is continuous at a. Set 
i= lub A. Then ||f® — f(@)|| < Gn + ed — a) + eby the continuity of fat l. 
Thus !€ A, anda<i<b. Weclaim that? = 6b. For if i < 6, then f’(d) 
exists and ||f'(O]]| < m. Therefore, there is a 6 such that 


NIf@} -fOV@ — Dil <mte 
when |z — /| < 6. It follows that 


IfE + 6) — flay < WE + 6) — Ol + IFO — Ka)! 
<(m+06+(m+e0i—a)t+e 
=(m+eoi+é—a)t+e, 


so that. J-+ 6 € A, a contradiction. Therefore, { = 6. We thus have 


lf) — fla)|| < (m+ 2) — a) +, 
and, since € is arbitrary, || (6) — f(a)| < m(b — a). O 


The following more general version of the mean-value theorem is the form in 
which it is ordinarily applied. As usual, F and @ are from a subset of V to W. 


Theorem 7.4. If F is differentiable in the ball B,(a), and if ||dFg|| < ¢ for 
every § in this ball, then ||A7’s(¢}|| < e||£|| whenever 6 and 8 + are in the 
ball. More generally, the same result holds if the ball B,{a) is replaced by 
any convex set C. 


Proof. The segment from 8 to 6+ £ is the range of the parametrized arc 
Mi) = 8+ té from [0,1] to Y. If 8 and 8 + é are in the ball B,(a), then this 
segment is a subset of the ball. Setting ¥@) = F(6 + ¢&), we then have 7/(2) = 
AP g4re(N'(z)) = dP p42:(£), from Theorem 7.2. Therefore, ||7’(x)|] < ellé|| on 
(0, 1], and the mean-value theorem then implies that 


dF e(é)l| = Fe + &) — FOI = CQ) — yO) S lll — 0) = ellell, 


which is the desired inequality. The only property of B,(a@) that we have used is 
that it includes the line segment joining any two of its points. This is the 
definition of convexity, and the theorem is therefore true for any convex set. U 


Corollary. If G is differentiable on the convex set C, if T © Hom(V, W), 
and if ||@Gg — T| < € for all 6 in C, then fAGe(e) — TC(2)|| < elltl] when- 
ever Band 8+ éareinC. 


Proof. Set F = G — T, and note that dFs = dGg — Tand AFg = AG, — T. U 


We end this section with a few words about notation. Notice the reversal 
of the positions of the variables in the identity (D,F)(a@) = dF,(£). This differ- 
ence has practical importance. We have a function of the two variables ‘a’ 
nud‘ which we can convert to a function of one variable by holding the other 
variable fixed; it is convenient technically to put the fixed variable in subscript 


150 THE DIFFERENTIAL CALCULUS 3.7 


position. Thus we think of dF,(£) with « held fixed and have the function dF, 
in Hom(V, W), whereas in (D;F')(a) we hold ¢ fixed and have the directional 
derivative D,#: A — W in the fixed direction £ as a function of a, generalizing 
the notation for any ordinary partial derivative af /dx,(a) as a function of a. 
Wecan also express this implication of the subscript position of a variable in the 
dot notation (Section 0.10): when we write D;F (a), we are thinking of the value 
at a of the function D,F(-). 

Still a third notation that we shall use in later chapters puts the function 
symbol in subseript position. We write 


J pla) = dF. 


This notation implies that the mapping F is going to be fixed through a discussion 
and gets it “out of the way” by putting it in subscript position. 

If F is differentiable at each point of the open set A, then we naturally con- 
sider dF to be the map a+ dF, from A to Hom(Y, W). In the “/”-notation, 
dF = Jy. Later in this chapter we are going to consider the differentiability 
of this map at a. This notion of the second differential d*F, = d(dF)q is probably 
confusing at first sight, and a preliminary look at it now may ease the later 
discussion. We simply have a new map G = dF from an open set A in a normed 
linear space V to a normed linear space X = Hom(V, W), and we consider its 
differentiability at a. If dG, = d(dF), exists, it is a linear map from V to 
Hom(V, W), and there 7s something special now. Referring back to Theorem 6.1 
of Chapter 1, we know that dG, = d?F, is equivalent by duality to a bilinear 
mapping w from V x V to W:since dG.(£) is itself a transformation in 
Hom(V, W), we can evaluate it at 7, and we define w by 


w(&, n= (4G..(&)} (n). 


The dot notation may be helpful here. The mapping a+ dF, is simply 
dF ..,, and we have defined G by G(-) = dF... Later, the fact that ¢G,(£) is a 
mapping can be emphasized by writing it as dG@.(£)(-). In each case here we 
have a function of one variable, and the dot only reminds us of that fact and 
shows us where we shall put the variable when indicating an evaluation. In the 
case of w we have the original use of the dot, as in w(é,-) = d@,(&). 


EXERCISES 


7.1 Given f:R — R such that f’(a) exists, show that the “directional derivative” 
Def(a) has the value df’ (a), by a direct evaluation of the limit of the difference quotient. 


7.2 Let f bea real-valued function on an n-dimensional space V, and suppose that f is 
differentiable at a € V. Show that the directions £ in which the derivative D,F(a) is 
zero make up an (2 — 1)-dimensional subspace of V (or the whole of V), What similar 
conclusions can be drawn if f maps V to a two-dimensional space W? 


3.7 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 151 


7.3 a} Show by a direct argument on limits that if f and g are two functions from an 
interval J CR to a normed linear space VY, and if f’(z) and g’(x) both exist, then 
(f + g)(z) exists and (f+ g)’(z) = f’(x) + g(a). 

b) Prove the same result as a corollary of Theorems 7.1 and 7.2 and the differen- 
tiation rules of Section 6. 

7.4 a) Given f:7 - V and g: 1 — W, show by a direct limit argument that if 
f’(z) and g’(z) both exist, and if P = <f,g>:I4V> W, then F’(z) exists and 
H(z) = <f'(@), (2) >. 

b) Prove the same result from Theorems 7.1 and 7.2 and the differentiation rules of 
Section 6, using the exact relation F = @1° f + 620 g. 

1.5 In the spirit of the above two exercises, state a product law for derivatives of 
ares and prove it ag in the (b) proofs above. 

7.6 Find the tangent vector to the arc <e',sin’> at t = 0; at £ = 7/2. [Apply 
exercise 7.4(a).] What is the differential of the above parametrized are at these two 
paints? That is, if f(f) = <e', sin’ >, what are df and df,;2? 

1.7 Let F:R? — R? be the mapping <2, y> > <3r%y, x2y3>. Compute the 
directional derivative D<;,2>7(3, —1) 

a) as the tangent vector at <3, —1> to the are fo A, where d is the straight line 
through <3, —1> in the direction <1, 2>; 
b) by first computing df’ <3,-1> and then evaluating at <1, 2>. 

7.8 Let \ and » be any two linear functionals on a vector space V, Evaluate the 
product f(z) = A(£)u(é) along the line € = fa, and hence compute Daf(a). Now 
rvaluate f along the general line £ = ta + 8, and from it compute D.f(§). 

7.9 Work the above exercise by computing differentials. 


7.10 If f:R" — R is differentiable at a, we know that its differential df,, being a 
lincar functional on R%, is given by its skeleton a-tuple L according to the formula 


afa(x) = (b,x) = Do lay. 
1 


in this context we call the n-tuple L the gradient of f at a. Show from the Schwarz 
mequality (Exereise 2.3} that if we use vectors y of Euclidean length 1, then the 
iireetional derivative Dyf(a) is maximum when y points in the direction of the gradient 
of f. 

7.11 Let W be a normed linear space, and let V be the set of parametrized ares 
4: (—1, 1] - W such that (0) = 0 and 4’(0) exists. Show that V is a vector space and 
that A —> A’(O) is a surjective linear mapping from V to W. Describe in words the 
clenments of the quotient space V/N, where 4 is the null space of the above map. 


1.12. Find another homogeneous nonlinear function. Evaluate its directional] deriva- 
lives De(0), and show again that they do not make up a linear map. 

1,13 Prove that if F is a differentiable mapping from an open ball B of a normed 
huear space V to a normed linear space W such that dP. = 0 for every a in B, then F 
1s constant function. 

1.14 Generalize the above exercise to the case where the domain of F is an open set A 
with the property that any two points of A can be joined by a smooth are lying in A. 


152 THE DIFFERENTIAL CALCULUS 3.8 


Show by a counterexample that the result does not generalize to arbitrary apen sets 4 
as the domain of F. 

7.15 Prove the following generalization of the mean-value theorem. Let f be a con- 
tinuous mapping from the closed interval [a, b] to a normed linear space V, and let g 
he a continuous real-valued function on (a, d]. Suppose that f’(t) and g’(f) both exist 
at all points of the open interval (a, 6) and that ||f’(#)|| < ¢’( on (a, 4). Then 


IFO) — fall S gb) — gla). 
(Consider the points z such that ||f(2) — f(a)|| < g(x) — g(a) -+ ee — a} + €] 


8 THE DIFFERENTIAL AND PRODUCT SPACES 


In this section we shall relate the differentiation rules to the special configurations 
resulting from the expression of a vector space as a finite Cartesian product. 
When dealing with the range, this is a trivial consideration, but when the domain 
is a product space, we become involved with a deeper theorem. These genera! 
product. considerations will be specialized to the R*-spaces in the next section, 
but they also have a more general usefulness, as we shall! sec in the later sections 
of this chapter and in later chapters. 

We know that an m-tuple of functions on a common domain, F!: A — W;, 
¢=1,...,m, is equivalent to a single m-tuple-valued function 


F:A>W=T[[W,, 
1 


F(a) being the m-tuple {¥*(a)}7? for each a € A. We now check the obviously 
necessary fact that F is differentiable at « if and only if each F" is differentiable 
ata. 


Theorem 8.1. Given F': A — W;,,i = 1,..., m,andF = <F', paeplee &; 
then F is differentiable at a if and only if all the functions ¥" are, in which 
case dF, = <dFi,...,dF™>. 


Proof. Strictly speaking, F ~ D% @; ° F*, where 0; is the injection of W; into 
the product space W = []7' W, (see Section 1.3). Since each 6; is linear and 
hence differentiable, with d(@,). = @;, we see that if each F* is differentiable at a, 
then so is F, and dF, = Y{? 6;°dF§. Less exactly, this is the statement 
dF, = <dFl,... ,dF2>. The converse follows similarly from Fi = a, 0 F, 
where 7; is the projection of []7 W; onto W;. 0 

Theorems 7.1 and 8.1 have the following obvious corollary (which can also 
be proved as casily by a direct inspection of the limits involved). 

Lemma 8.1. If f; is an are from [a,b] to Wi, for? = 1,...,2, and if f is 

the n-tuple-valued are f= </fi,.-.,fa2>, then f’{x) exists if and only if 

fi (a) exists for each 7, in which case f’(x) = <fi(x),...,fa(z) >. 


When the domain space V is a product space []{ V; the situation is more 
complicated. A function F(£),..., &,) of 2 vector variables does not decompose 


3.8 THE DIFFERENTIAL AND PRODUCT SPACES 153 


into an equivalent #-tuple of functions. Moreover, although its differential 
dF does decompose into an equivalent n-tuple of partial differentials {4%}, 
we do not have the simple theorem that dF» exists if and only if the partial 
differentials dF, all exist. 

Of course, we regard a function F(£1,..., 2) of n vector variables as being 
a function of the single n-tuple variable & = <£1,..., &, >, 30 that in principle 
there is nothing new when we consider the differentiability of F. However, when 
we consider a composition F o G, the inner function G must now be an n-tuple- 
valued function G = <g!,...,g">, where g* is from an open subset A of some 
normed linear space X to V;, and we naturally try to express the differential 
of F o Gin terms of the differentials dg’. To accomplish this we need the partial 
differentials dFy of F. For the moment we shall define the jth partial differential 
of F ata = <a),...,a,> as the restriction of the differential dFa to V;, 
considered as a subspace of V = JT} V;. As usual, this really involves the 
injection 6; of V; into J] V:, and our formal (temporary) definition, accordingly, 
ix 


dFi = dFqo 8;. 
Then, since £ = <%),...,é,> = LY 6:{E), we have 


dF a(t) = »u dF a(é;). 


Similarly, since @ = <g!,...,g"> = £4 6,0 g', we have 


Ld a 
d(F o Gy = 25 dP Gy ° dg’, 
1 
which we shall call the general chain rule. There is ambiguity in the “:”-super- 
xeripts in this formula: to be more proper we should write (dF), and d(g*)y. 
We shall now work around to the real definition of a partial differential. 
Since 
AF ao 0; = (dFa+0)° 6; = dFuc 6; +0= dFato, 


we see that dF’, can be directly characterized, independently of dF a, as follows: 
dF is the unique element 7; of Hom(V,, W) such that AFae 6; = T;-+ 0. 


That is, dF§ is the differential at a; of the function of the one variable ¢; 
ubtained by holding the other variables in F(#),..., £,) fixed at the values 
t, = a;. This is important because in practice it is often such partial differen- 
liubility that we come upon as the primary phenomenon. We shall therefore 
take this direct characterization as our definition of dF%, after which our moti- 
vuting calculation above is the proof of the following lemma. 


Lemma 8.2, If A is an open subset of a product space V = [I] V;, and if 
F; A —+ W is differentiable at «, then all the partial differentials dFq exist 
and dFa = dF oo 6;. 


154 THE DIFFERENTIAL CALCULUS 3.8 


The question then occurs as to whether the existence of all the partial 
differentials dF%, implies the existence of dFa. The answer in general is negative, 
as we shall see in the next section, but if all the partial differentials df’, exist. for 
each « in an open set A and are continuous functions of «, then F is continuously 
differentiable on A. Note that Lemma 8.2 and the projection-injection identities 
show us what df'a must be if it exists: dFa = dFao 0; and L 8; ° 7; = I together 
imply that dFa = L dFqe 7. 


Theorem 8.2. Let A be an open subset of the normed linear space 
V = V, X Vo, and suppose that F: A — W has continuous partial differ- 
entials dF. a> and dF%. g> on A. Then dF <q,g> exists and is continuous 
on A, and 4F ¢a,8>(, 9) = AF %a,8>(€) + dF Za,8>(n). 


Proof. We shall use the sum norm on V = V,; X Vg. Given €, we choose 6 so 
that |\dFpy> — aF'<x,>|| < € for every <y,yv> in the é-ball about <a, 6> 
and fori = 1, 2. Setting 


G(t) = Fla + §,8 + 9) — dF Xa.g>(8), 
we have 


]2G4|| = []@F xatse4a> — OF kap>ll<e if Il<s, >|] < 3, 
and the corollary of Theorem 7.4 implies that 


[Fle + & B+ 2) — Fle, 8 +2) ~ dFkaep>(BIl < «lel 
when || < £ 7>|| < 6. Arguing similarly with 


H(n) = F(a, 8+ 9) — 4F%a.9>(9); 
we find that 
|F(e, 8+ 2) — Fla, 8) — dF 20.8>(n)|| < «fall 


when ||<0, »>|| < 6. Combining the two inequalities, we have 
[AF <a,p>(, 1) — T(<§, o>) S ell< >|} 


when ||<£,9>|| < 6, where T= dFhagye m1 +dPxapy° Te. That is, 
AF ca,p> ~— T = 0, and so dF’ <q,g> exists and equals T. 0 


The theorem for more than two factor spaces is a corollary. 


Theorem 8.3, If A is an open subset of []?V; and F — W is such that for 
each 7 = 1,...,7 the partial differential dq exists for all a € A and is 
continuous as a function of @ = <a,,...,a,>, then dFa exists and is 
continuous on A. If = <£,..., & >, then dPa() = Lt dFa(é). 


Proof. The existence and continuity of dF! and dF2 imply by the theorem that 
dFileo#, + dFie7zzg is the differential of F considered as a function of the first 
two variables when the others are held fixed. Since it is the sum of continuous 


3.8 THE DIFFERENTIAL AND PRODUCT SPACES 155 


functions, it is itself continuous in aw, and we can now apply the theorem again 
to add dF3 to this sum partial differential, concluding that +53 dF4o 7; is the 
partial differential of F on the factor space V,; X V2 X V3, and so on (which is 
colloquial for induction). U 


As an illustration of the use of these two theorems, we shall deduce the 
general product rule (although a direct proof based on A-estimates is perfectly 
feasible). A general product is simply a bounded bilinear mapping w:X x Y— W, 
where X, ¥, and W are all normed linear spaces. The boundedness inequality 
here is ||o(£, 9)l] < 6] &l |'al]. 

We first show that w is differentiable. 


Lemma 8.3. A bounded bilinear mapping w:X x Y — W is everywhere 
differentiable and dw <o,3>(£, 7) = wla, 9) + w(, 8). 


Proof. With 8 held fixed, gg(t) = w(¢, 8) is in Hom(X, W) and therefore is 
everywhere differentiable and equal to its own differential. That is, dw! exists 
and daa e>(t} = w(t, 8). Since 8+» gg is a bounded linear mapping, 
dw o,g> = ge is a continuous function of <a,8>. Similarly, dw%ag>(2) = 
w(a, 7), and dw? is continuous. The lemma is now a direct corollary of Theorem 
8.2. 0 


If w(£, ) is thought of as a product of ¢ and , then the product of two 
functions g(t) and A(f) is w(g(r), #(¢)), where g is from an open subset A of a 
normed linear space V to X and his from A to Y. The produet rule is now just 
what would be expected: the differential of the product is the first times the 
differential of the second plus the second times the differential of the first. 


Theorem 8.4. If g: A ~ X and h: A — F¥ are differentiable at 8, then so 
is the product F(¢) = w(y(t), A(¢)) and 


dF a(¢) = w(g(8), dhg(¢)) + w(dge(s), h(B)). 


Proof. This is a direct corollary of Theorem 8.1, Lemma 8.3, and the chain 
rule. O 


EXERCISES 


$.1 Find the tangent vector to the are <sin?#, cost, ¢?> at tf = 0; at ¢ = 1/2. 
What is the differential of the above parametrized arc at the two given points? That is, 
iff) = <sint, cost, 2 >, what are dfp and df,;2? 


8.2 Give the detailed proof of Lemma 8.1. 
8.3 The formula 


dFa(t) = >> dF até) 
] 


156 THE DIFFERENTIAL CALCULUS 3.9 


is probably obvious in view of the identity 
& = 2 9k) 
1 


and the definition of partial differentials, but write out an explicit, detailed proof 
anyway. 

8.4 Let F be a differentiable mapping from an a-dimensional veetor space V to a 
finite-dimensional vector space W’, and define G: ¥ XK W— W by G(é, 9) = » — F(Z). 
Thus the graph of F in ¥ X W is the null set of G. Show that the null space of 
dG ¢a,s> has dimension n for every <a, 68> EV X W. 

835 Let F(é,7) be a continuously differentiable function defined on a product 
A B, where B is a ball and 4 is an open set. Suppose that dF%..35 = 0 for all 
<a,8> in AX B. Prove that F is independent of y. That is, show that there is a 
continuously differentiable function G(£) defined on A such that P(£, 7) = Gt) on 
AX B. 

$.6 By considering 4 domain in R? as indicated at the right, show 
that there exists a function f(z, y} on an open set A in R? such that 


everywhere and such that f(z, y) is not a function of x alone. 

8.7 Let F(£, 7, £) be any function of three vector variables, and for fixed Y set 
G(é,9) = F(é, 7,7). Prove that the partial differential di’ ke,3,y> exists if and only 
if dG c2.8> exists, in which case they are equal. 

8.8 Give a more careful proof of Theorem 8.3. That is, state the inductive hypothesis 
and show that the theorem follows from it and Theorem 8.2. If you are meticulous in 
your argument, you will need a form of the above exercise, 

8.9 Let f be a differentiable mapping from R? to R. Regarding R? as R X R, show 
that the two partial differentials of f are simply multiplication by its partial derivatives. 
Generalize to n dimensions. Show that the above is still true for a map F from R? toa 
general vector space V, the partial derivatives now being vectors. 


8.10 Give the details of the proof of Theorem 8.4. 


9. THE DIFFERENTIAL AND R” 


We shall now apply the results of the last two sections to mappings involving the 
Cartesian spaces R”, the bread and butter spaces of finite-dimensional theory. 
We start with the domain. 


Theorem 9.1, If F is a mapping from (an opcn subset of) R* to a normed 
linear space W, then the dircctional derivative of F in the direction of the jth 
standard basis vector 6’ is just the partial derivative dF /4z2;, and the jth 
partial differential is multiplication by oF /dxr,;: dF4(h) = h(@F /dx;){a). 
More exactly, if any one of the above three objects exists at a, then they 
all do, with the above relationships, 


3.9 THE DIFFERENTIAL anp R® 157 


Proof. We have 


OF je sm Petts HT +s An) — F(ay,...,@j,..., 4) 
Ox; 10 t 
: i) RP 
= lim “e+ ) — Fe) _ DyF (a). 
t0 


Moreover, since the restriction of F to a+ R# is a parametrized are whose 
differential at 0 is by definition the jth partial differential of F at a and whose 
tangent veetor at 0 we have just computed to be (¢F'/déx;)(a), the remainder of 
the theorem follows from Theorem 7.1. 0 


Combining this theorem and Theorem 7.2, we obtain the following result. 


Theorem 9.2. If V = R” and F is differentiable at a, then the partial 
derivatives (4F/dx;)(a) all exist and the n-tuple of partial derivatives at 
a, {(@F /dx;)(a)} 1, is the skeleton of d¥F,. In particular, 


= oF 
D,F(a) = >» Yi az, (a). 


Proof. Since dF,(8°} = DsiP(a) = (8F /dxz,(a), as we noted above, we have 


% ; nw ; cid oF 
D,F{a) = dF,{y) = dF, 2 ys ) = x y:@F (4°) = 2 Vs ae. (a). 
I 1 é 
All that we have done here is to display dF, as the lincar combination mapping 
iefined by its skeleton {dF,(8')} (see Theorem 1.2 of Chapter 1}, where T(3*) = 
dF,(6*) is now recognized as the partial derivative (0F'/dz,;)(a). 7 


The above formula shows the barbarism of the classical notation for partial 
derivatives: note how it comes out if we try to evaluate dF,(x). The notation 
D,siF is precise but cumbersome. Other notations are F; and D;F. Each has its 
problems, but the second probably minimizes the difficulties. Using it, our 
formula reads dF,(y) = SOL, yj DjF (a). 

In the opposite direction we have the corresponding specialization of 
Theorem 8.3. 


Theorem 9.3. If A is an open subset of R”, and if F is a mapping from A 
to a normed linear space W such that all of the\partial derivatives 
(dF /dx;)(a) exist and are continuous on A, then F is continuously differ- 
entiable on A. 


Proof. Since the jth partial differential of F is simply multiplication by dF /dz;, 
we are (by Theorem 9.1) assuming the existence and continuity of all the partial 
differentials dF, on A. Theorem 9.3 thus becomes a special case of Theorem 8.3. U 


Now suppose that the range space of F is also a Cartesian space, so that F 
is @ mapping from an open subset A of R” to R”. Then dF, isin Hom(R’, R”). 


158 THE DIFFERENTIAL CALCULUS 3.9 


For computational purposes we want to represent linear maps from IR” to R” 
by their matrices, and it is therefore of the utmost importance to find the matrix 
t of the differential 7 = ¢F,. This matrix is called the Jacobian matrix of F 
at a. 

The columns of t form the skeleton of dF,, and we saw abeve that this 
skeleton is the n-tuple of partial derivatives (@F/dz;)(a). If we write the m- 
tuple-valued F loosely as an m-tuple of functions, F = <f,,..-,fm>, then 
according to Lemma 8.1, the jth column of t is the m-tuple 


Fa =< %@,...,.2@>. 
Thus, 


Theorem 9.4. Let F be a mapping from an open subset of R” to R”, and 
suppose that F is differentiable at a. Then the matrix of dF, (the Jacobian 
matrix of F at a) is given by 


— Fi 

ti; x 2; (a) 

If we use the notation y; = f;(x), we have 
— OY: 

bij = az; ¢ ) 


If we also have a differentiable map z = G{y) = <gify),...,91(y) > from 
an open set B Cc R® into R’, then dG, has, similarly, the matrix 


3a (by = i (b). 


Also, if B contains b = F(a), then the Se het rule 
ad(GeoF), = dG, ° dF 
has the matrix form 


Set (a) = Do SE (b) SH (a), 


or simply 

Oz, Oe ays. 

Ox; £2, Oy; a2; 
This is the usual form of the chain rule in the calculus. We see that it is merely 
the expression of the composition of linear maps as matrix multiplication. 

We saw in Section 8 that the ordinary derivative f(a) of a function f of one 
real variable is the skeleton of the differential df,, and it is perfectly reasonable to 
generalize this relationship and define the derivative F’(a) of a function F of 
n real variables to be the skeleton of d¥,, so that F’(a) is the x-tuple of partial 
derivatives {(0F/0x,){a)}], as we saw above. In particular, if F is from an open 
subset of R® to R™, then F’{a) is the Jacobian matrix of F at a. This gives the 


3.9 THE DIFFERENTIAL AND R® 159 


matrix chain rule the standard form 
(Ge Fy(a) = G'(F(a)) F(a). 


Some authors use the word ‘derivative’ for what we have called the differ- 
ential, but this is a change from the traditional meaning in the one-variable case, 
and we prefer to maintain the distinction as discussed above: the differential dF, 
is the linear map approximating AP, and the derivative F’(a) must be the 
matrix of this linear map when the domain and range spaces are Cartesian. 
However, we shall stay with the language of Jacobians. 

Suppose now that A is an open subset of a finite-dimensional vector space VY 
and that H: A — W is differentiable at a € A. Suppose that W is also finite- 
dimensional and that ¢: V — R® and ¥: W — R” are any coordinate i Ono 
phisms. If A = lA), then A is an open subset of R* and H = yoH og isa 
mapping from A to R” which is differentiable at a = y(a), with dH, = 
¥°odH,og*. Then dH, is given by its Jacobian matrix {(8h*/dx;)(a)}, which 
we now call the Jacobian matrix of H with respect to the chosen bases in V and 
W, Change of bases in V and W changes the Jacobian matrix according to the 
rule given in Section 2.4. 

If F is a mapping from R” to itself, then the determinant of the Jacobian 
matrix (df*/dx;}(a) is called the Jacobian of F at a. It is designated 


aft, oS") (gy O(Y1, - «+s Yn) (a) 


or 
O(xy,-.+52n) O(x1,---52n) 


if it is understood that y; = f*(x). Another notation is J (a) (or simply J(a) if 
F is understood). However, this is sometimes used to indicate the differential 
édF., and we shall write det J r{a) instead. 

If F(x) = <x? — x2, 2r,r2>, then its Jacobian matrix is 


22) —2ro 
229 22 i 


and det J{x) = 4(z? + 23) = 4(|'x[]2)2. 


EXERCISES 


9.1 By analogy with the notion of a parametrized arc, we define a smooth param- 
étrized two-dimensional surface in a normed linear space W to be a continuously 
differentiable map [ from a rectangle J X J in R? to W. Suppose that IX J = 
(1, 1) X [—1, 1], and invent a definition of the tangent space to the range of Pin W 
at the point ['(0, 0). Show that the two vectors 


ar ar 
3g (00) and 5 (0,0) 


are a basis for this tangent space. (This should not have been your definition.) 


160 THE DIFFERENTIAL CALCULUS 3.9 


9,2 Generalize the above exercise to a smooth parametrized n-dimensional surface 
in a normed linear space W’. 

9.3 Compute the Jacobian matrix of the mapping <z, y> PR <2?, y?, (2+ y)? >. 
Show that its rank is two except at the origin. 

94 Let F = <f!, 2, 7>> from R? to R§ be defined by 


fi@ya2=satyts felay2 =r y+, 
and 
fa{x, ¥,2 = a3 + yt 23, 


Compute the Jacobian of F at <a, b,¢>. Show that rt is nonsingular unless two of the 
three coordinates are equal. Describe the locus of its singularities. 


9.5 Compute the Jacobian of the mapping F:<z,y> rR <(z+y)%,y°> from 
R? to R? at <1, —1L>;at <1, 0>;at <a, b>. Compute the Jacobian of G: <s,£> 
<s—t,s-+t> at <u>. 


9.6 In the above exercise compute the compositions F ° G and Go F. Compute the 
Jacobian of Fo G at <y,»>. Compute the corresponding product of the Jacobians 
of F and G. 


9.7 Compute the Jacobian matrix and determinant of the mapping 7 defined by 
z=rcosé, y =rsiné, 2 = 2. Composing a function f(x, y,z) with this mapping 
gives a new fuaction: 

gir, 0,2) = fir cos 6, r sin 8, 2}. 


That is, g = fo 7. This composition (substitution) is called the change to cylindrical 
coordinates in R¥. 


9.8 Compute the Jacobian determinant of the polar coordinate transformation 
<r, 06> <2, y>, where z = rcosé,y = 7 sin @. 


9.9 The transformation to spherical coordinates is given by z = rsin ¢ cos 6, 
y = rsingsin @,z = reos@, Compute the Jacobian 


H(z, Ys 2) 
Ar, g, A} 


9.10 Write out the chata rule for the following special cases: 
dw/dt = ?, where w= F(x,y), «= 9), y = A. 


Find dw/dt when w = F(ai,...,%,) and x; = g:(),t = 1,...,2. Find dw/du when 
w= F(2z,y), t = glu,v), y = A(u,v). The special case where g(u,v) = % can be 
rewritten 


o:. 
oom (2, h(2, v)). 


Compute it. 
9.11 Ife = f(z, y), © = reos 6, and y = rsin 8, show that 


st Bal - Eat Ea 


3.10 ELEMENTARY APPLICATIONS 161 


10. ELEMENTARY APPLICATIONS 


The elementary max-min theory from the standard caleulus generalizes with 
little change, and we include a brief discussion of it at this point. 


Theorem 10.1. Let F be a real-valued function defined on an open subset A 
of a normed linear space V, and suppose that F assumes a relative maximum 
value at a point ain A where dF, exists. Then dF, = 0. 


Proof. By definition D,F(a) is the derivative ¥’(Q) of the function ¥(¢) = 
F(e@ + ¢&), and the domain of ¥ is a neighborhood of Oin R. Since Y has a relative 
maximum value at 0, we have ¥’(0) = 0 by the elementary calculus. Thus 
dF,(£) = DF (a) = 0 for all ¢, and so dF, = 0. U 


A point « such that dF, = 0 is called a critical point. The theorem states 
that a differentiable real-valued function can have an interior extremal value 
only at a critical point. 

If ¥V is R", then the above argument shows that a real-valued function F 
can have a relative maximum (or minimum) at a only if the partial derivatives 
(OF /dx,){a) are all zero, and, as in the elementary caleulus, this often provides 
a way of calculating maximum (or minimum) values. Suppose, for example, 
that we want to show that the cube is the most efficient rectangular parallelepiped 
from the point of view of minimizing surface area for a given volume V. If 
the edges are x, y and z, we have V = vyz and A = 2(ry + zz + yz) = 
2ay+ V/y+V/r). Then from 0 = 8A/dc = 2(y — V/x*), we see that 
V = yz", and, similarly, ¢A/dy = 0 implies that V = zy”. Therefore, yx? = 
zy”, and since neither z nor y can be Q, it follows that z = y. Then V = yx? = 
28, and z= V3 = y. Finally, substituting in V = xyz shows that z= V"/*. 
Our critical configuration is thus a cube, with minimum area A = 6V?/?, 

It was assumed above that A fas an absolute minimum at some point 
<zx,y,z2>. The reader might enjoy showing that A — oo if any of x, y, z tends 
to 0 or oo, which implies that the minimum does indeed exist. 

We shall return to the problem of determining critical points in Sections 
12, 15, and 16. 

The condition dF, = 0 is necessary but not sufficient for an interior maxi- 
mum or minimum. The reader will remember a sufficient condition from 
beginning calculus: If f’(x) = O and f(x) < 0 (>0), then vis a relative maxi- 
mum (minimum) point for f. We shall prove the corresponding general theorem 
in Section 16, There are more possibilities now; among them we have the 
analogous sufficient condition that if dF, = 0 and d?F,, is negative (positive) 
definite as a quadratic form on Y, then a is a relative maximum (minimum) 
point of F. 

We consider next the notion of a tangent plane to a graph. The calculation 
of tangent lines to curves and tangent planes to surfaces is ordinarily considered 
a geometric application of the derivative, and we take this as sufficient justifica- 
tion for considering the general question here. 


162 THE DIFFERENTIAL CALCULUS 3.10 


Let F be a mapping from an open subset A of a normed linear space V to 
a normed linear space W. When we view F as a graph in V x W, we think of it 
asa “surface” S lying “over” the domain A, generalizing the geometric interpre- 
tation of the graph of a real-valued function of two real variables in R*? = R* x R. 
The projection 7;: V x W — V projects S “down” onto A, 


<i, F(3)> = &, 


and the mapping & <£, F(£)> gives the point of S lying “over” & Our 
geometric imagery views V as the plane (subspace) V x {0} in V X W, just as 
we customarily visualize R as the real axis R X {0} in R?. 

We now assume that F is differentiable at a. Our preliminary discussion in 
Section 6 suggested that (the graph of) the linear function dF, is the tangent 
plane to (the graph of) the function AF, in V xX W, and that its translate Af 
through <a, F(a)> is the tangent plane at <a, F(a)> to the surface S that is 
(the graph of} F. The equation of this plane is »y — F(a) = dF.( — a), and 
it is accordingly (the graph of) the affine function G(é) = dF.(t — a) -+ F(a). 
Now we know that dF, is the unique T in Hom{¥, W) such that AF,(¢) = 
T(r) + 0(f), and if we set ¢ = & — a, it is easy to see that this is the same as 
saying that G is the unique affine map from V to W such that 


F(£) — Gd) = e(§ — @). 


That is, Af is the unique plane over V that “fits” the surface S around <a, F(a) > 
in the sense of o-approximation. 

However, there is one further geometric fact that greatly strengthens our 
feeling that this really is the tangent plane. 


Theorem 10.2, The plane with equation 9» — Fla) = dFa(t — a) is 
exactly the union of all the straight lines through <a, F(a2)> in V¥ xX W 
that are tangent to smooth curves on the surface S = graph F passing 
through this point. In other words, the vectors in the subspace dF, of 
Y x W are exactly the tangent vectors to curves lying in S and passing 
through <a, F(a)>. 


Proof. This is nearly trivial. If <¢, 47> € dF,, then the are 


¥OH= <a+2é, Fla + th)> 


in S lying over the line > a+ ¢& in V has <&, dFy(t)> = <£, 14> as its 
tangent vector at <a, F(a)>, by Lemma 8.1 and Theorem 8.2. 

Conversely, if f+ <A(t), F(A(2}) > is any smooth are in S passing through a, 
with a = A(éy), then its tangent vector at <a, F(a) > is 


< Nitto), dF ., (X’(to)} > , 


a vector in (the graph of) dF,. 0 


3.10 ELEMENTARY APPLICATIONS 163 


As an example of the general tangent plane discussed above, Jet F = 
<fi,f2> be the map from R? to R? defined by fi(x) = (27 — 23)/2, fo(x) = 
t¢9. The graph of F is a surface over R? in R* = R? x R*. According to our 
above discussion, the tangent plane at <a, F(a)> has the equation y = 
dFj{x — a) + F(a). Ata = <1,2> the Jacobian matrix of dF, is 


eS leas al 
£2 X1J<1.2> 2 Ij’ 


and F(a) = < —#,2>. The equation of the tangent plane Af at <1, 2> is thus 


<9, 42> = E =| <2, — 1,22 —2> + <—§,2>. 


(‘omputing the matrix product, we have the scalar equations 
Yi = 4) — 2ag+(-1 +4 —$) = 2 -- 2x24 §, 
Yo = Qe, + 22+ (-2 —2 +42) = 2x, +22 — 2. 


Note that these two equations present the affine space A/ as the intersection 
of the hyperplane in R* consisting of all <2), x2, y1, yo > such that 


ay — 22 — yi = —¥, 
with the hyperplane having the equation 


241 + fe — Yo = 2. 


"EXERCISES 


10.1 Find the maximum value of f(z, y, 2) = x + y+ on the ellipsoid 
x? + Qy% + 322 = 1, 


10.2 Find the maximum value of the linear functional f(x) = > (Te. on the unit 
sphere St a7 = 1. 
10.3 Find the (minimum) distance between the two lines 


x = fl and y = s<1,1,1>+ <1,0,—1> 
in RS, 

10.4 Show that there is a uniquely determined pair of closest points on the two lines 
, =ta+1 and y = sb+m in R” unless b = ka for some &. We assume that 
« #0 +b. Remember that if b is not of the form ka, then |{a, b)| < |lallo |/b|2, 
uccording to the Schwarz inequality. 

10.5 Show that the origin is the only eritiesl point of f(z, y, 2) = xy— yz + zz. 
Vind a line through the origin along which 0 is a maximum point for f, and find another 
line along which 0 is a minimum point. 


164 THE DIFFERENTIAL CALCULUS 3.1] 


10.6 In the problem of minimizing the area of a rectangular parallelepiped of given 
voluine V worked out in the text, it was assumed that. 


= 2(y+¥+2) 


has an absolute minimum at an interior point of the first quadrant. Prove this. Show 
first that .{ > © if <x, y> approaches the boundary in any way: 


r—-0, «£70, yood, or yom. 
10.7 Let #: R? — R® be the mapping defined by 
yi = sin (21 + 29}, y2 = cos (x) — x2). 


Find the equation of the tangent plane in R# to the graph of # over the point a = 
<a/4,a/4>. 
10.8 Define #:R? — R2 by 


3 a 
made y= Ls 
1 1 


Find the equation of the tangent plane to the graph of F in R5 over a = <1, 2, —L>. 

10.9 Let w(t, ») be a bounded bilinear mapping from a product normed linear space 
¥ X IF to a normed linear space .Y. Show that the equation of the tangent plane to the 
graph S of win VX WX X at the point <a, 8,¥> €S is 


£ = w(f,B)-| ola, 9)-} wle, 8). 


10.10 Let ¥ be a bounded linear functional on the normed linear space V. Show that 
the equation of the tangent plane to the graph of F? in V x R over the point a can be 
written in the form y = F2(a)(3F(£}) — 2F{a)). 

10.11 Show that if the general equation for a tangent plane given in the text is applied 
toa mapping / in Hom(V, W}, then it reduces to the equation for F itself [q = F()), 
no matter where the point of tangeney. (Naturally!) 

10.12 Continuing Exercise 9.1, show that the tangent space to the range of F in IW 
at P(0) is the projection on I of the tangent space to the graph of Tin R? X IV at the 
point <0, P(0}>. Now define the tangent plane to range [in W at (0), and show 
that it is similarly the projection of the tangent plane to the graph of T. 

10.13 Let #: ¥ — WW’ be differentiable at a. Show that the range of d?#’, is the pro- 
jection on I of the tangent space to the graph of Fin V X Wat the point <a, Fle) >. 


TL. THE IMPLICIT-FUNCTION THEOREM 


The formula for the Jacobian of a composite map that we obtained in Section 9 
is reminiscent of the chajn rule for the differential of a composite map that we 
derived earlier (Section 8). The Jacobian formula involves numbers (partial 
derivatives) that we multiply and add; the differential chain rule mvolves linear 
maps (partial differentials) that we compose and add. (The similarity becomes 
a full formal analogy if we use block decompositions.) Roughly speaking, the 


3.11 THE !MPLICIT-FUNCTION THEOREM 165 


whole differential calculus goes this way. In the one-variable calculus a differ- 
ential is a linear map from the one-dimensional space R to itself, and is therefore 
multiplication by a number, the derivative. In the many-variable calculus when 
we decompose with respect to one-dimensional subspaces, we get blocks of such 
numbers, i.e., Jacobian matrices. When we generalize the whole theory to vector 
spaces that are not one-dimensional, we get essentially the same formulas but 
with numbers replaced by linear maps (differentials) and multiplication by 
composition, 

Thus the derivative of an inverse function is the reeiprecal of the derivative 
of the function: if g = f—! and 6 = f(a), then g(b) = 1/f’(a). The differential 
of an inverse map is the compositor inverse of the differential of the map: if 
G = F—' and F(a) = 6, then dGg = (dF,)7'. 

If the equation g(x, y} = 0 defines y implicitly as a function of z, y = f(z), 
we learn to compute f‘(a) in the elementary calculus by differentiating 


g(x, f(z)) = 9, 
and we get 
Fe (a, B) + 52 (ad) f(a) = 0, 
where b = f(a). Hence 
ry gfx 
FO) =~ aafoy 

We shall see below that if G(é, 4) = 0 defines 7 as a function of &, 7 = F(€), 
and if 8 = F(a), then we calculate the differential dF, by differentiating the 
identity G(z, P(g) = 0, and we get a forrnula formally identical to the above. 

Finally, in exactly the same way, the so-called auxiliary variable method of 
solving max-min problems in the elementary calculus has the same formal 
structure as our later solution of a “constrained” maximum problem by Lagrange 
multipliers. 

In this section we shall consider the existence and differentiability of func- 
tions implicitly defined. Suppose that we are given a (vector-valued) function 
G(&, ») of two vector variables, and we want to know whether setting G equal 
to 0 defines » as a function of ¢, that is, whether there exists a unique function F 
such that @(£ F(£)) is identically zero. Supposing that such an “implicitly 
defined” function F exists and that everything is differentiable, we can try 
to compute the differential of F at « by differentiating the equation G(é, F(¢)) = 
0,or Go <7, F > = 0. Weget dGraa>o dl, + dGxag> ° di, = 0, where we 
have set 8 = F(a). If dG? is invertible, we can solve for d¥., getting 


AF = dhe gc) edge ss 


Note that this has the same form as the corresponding expression from the 
elementary calculus that we reviewed above. If F is uniquely determined, then 
so is dFy, and the above calculation therefore strongly suggests that we are 


166 THE DIFFERENTIAL CALCULUS 3.11 


going to need the existence of (d@2.,8>)7' as a necessary condition for the 
existence of a uniquely defined implicit function around the point <a, 86>. 
Since ¢ is F(a), we also need G(a, £) = 0. These considerations will lead us to 
the right theorem, but we shall have to postpone part of its proof to the next 
chapter. What we can prove here is that if there is an implicitly defined function, 
then it must be differentiable. 


Theorem 11,1, Let V, W, and X be normed linear spaces, and let G be a 
mapping from an open subset A X B of V x W to X. Suppose that F is a con- 
tinuous mapping from A to B implicitly defined by the equation G(£, 7) = 0, 
that is, satisfying G(£, F(4)}}) = 0 on A. Finally, suppose that G is differ- 
entiable at <a, 68>, where 6 = F(a), and that dG2, g> is invertible. Then 
F is differentiable at a and dF, = —(dG@q,3>)7' 0 d@ ea py. 


Proof. Set 7 = SF,(t), so that Gla+ §4+2) = Ge+ti,Fat+s))=0. 
Then 


0= Gla+ §,8+- n) _ Gla, 8) = AG gap >(é n) — dG <x. >( ) + o£, 7) 
= dG ea,8>(4) + AG xa,a>(9) + e(&, 7). 


Applying T—? to this equation, where T = dG%.,4>, and solving for n, we get 


y= —TO'(d@xa.8>(£)) + OC <§ 9>)). 


This equation is of the form y = O(£) + o(< &, 47>), and since y = AF,(é) is 
an infinitesimal J(£), by the continuity of F at a, Lemmas 5.1 and 5.2 imply 
first that » = 0(%) and then that <£,7> = 0(2). Thus 0(c(<é 4>)) = 
©((08(£))) = o(£), and we have 


AF,(E) = 4 = S(é) + o(€), 


where S = —(dG%a,g>) ! 0 dG@ka.g>, an element of Hom(V, W). Therefore, F 
is differentiable at a and dF, has the asserted value. 0 


We shall show in the next chapter, ss an application of the fixed-point 
theorem, that if V, W, and X are finite-dimenstonal, and if G is a continuously 
differentiable mapping from an open subset A X B of V x W to X such that 
at the point <a, @> we have both G(a, 6) = 0 and dG%.,3> invertible, then 
there is a uniquely determined continuous mapping F from a neighborhood M of 
« to B such that F(a) = 6 and G(é, F(#)) = 0 on M. The same theorem is true 
for the more genera] class of complete normed linear spaces which we shall study 
in the next chapter. For these spaces it is also true that if T—! exists, then so 
does S~! for all S sufficiently close to 7, and the mapping S > S~! is contin- 
uous. Therefore dG2,.5 is invertible for all <p,»> sufficiently close to 
<a,8>, and the above theorem then implies that F is differentiable on 4 
neighborhood of a. Moreover, only continuous mappings are involved in the 
formula given by the theorem for dF: « — dF,, and it follows that F is in fact 
continuously differentiable near a. These conclusions constitute the implicit- 
function theorem, which we now restate. 


3.4 THE IMPLICIT-FUNCTION THEOREM 167 


Theorem 11.2. Let VY, W, and X be finite-dimensional (or, more generally, 
complete) normed linear spaces, let A X B be an open subset of V x W, 
and let G: A x B — X be continuously differentiable. Suppose that at the 
point <a, 8> in A X B we have both G(a, 8) = 0 and dG%,..3 invertible. 
Then there is a ball Af about a and a uniquely defined continuously differen- 
tiable mapping F from M to B such that F(a) = g and G(¢, F(¢)) = Oon M. 


The so-called inverse-mapping theorem is a special case of the implicit- 
finetion theorem. 


Theorem 11.3. Let H be a continuously differentiable mapping from an 
open subset B of a finite-dimensional (or complete} normed linear space W 
to a normed linear space V, and suppose that its differential is invertible at 
4, point 8. Then Z itself is invertible near 8. That is, there is a ball A7 about 
a = H(@) and a uniquely determined continuously differentiable function F 
from M to B such that F(a) = 8 and H(F()) = & on M. 


Proaf. Set G(é, 7} = — — Hin). Then G is continuously differentiable from 
¥ x B to V and d@2..a> = —dHz is invertible. The implicit-function theorem 
then gives us a ball about @ and a uniquely determined continuously differ- 
eufiable mapping F from M to B such that F(a) = 6 and 0 = G(é, F()) = 
i H(F(t)) on M. 0 


The inverse-mapping theorem is often given a slightly different formulation 
abich we state as a corollary. 


Corollary. Under the hypotheses of the above theorem there exists an open 
neighborhood U of 6 such that H is injective on U, N = HjU] is open in 
V, and H—! is continuously differentiable on N. (See Fig. 3.11.) 


Fig. 3.11 


feof. The proof of the corollary is left as an exercise. 


In practice we often have to apply the Cartesian formulations of these 
‘heorems. The student should certainly be able to write these down, but we 
“hall state them anyway; starting with the simpler inverse-mapping theorem. 


Theorem 11,4. Suppose that we are given n continuously differentiable 
real-valued functions Gj(y1,.--;¥a2)) t= 1,...,2”, of n real variables 
defined on a neighborhood B of a point b in R” and suppose that the Jacobian 
determinant 

d(Gy, ..., Gr) 


ey, St J Yn} ) 


168 THE DIFFERENTIAL CALCULUS 3.11 


is not zero. Then there is a ball Jf about a = G(b) in R®™ and a uniquely 
determined n-tuple F = <Fy,...,F, > of continuously differentiable real- 
valued functions defined on M such that F(a) = b and G(F(x)) = x on J! 
fori=1,...,”. That is, GF j(01,...,%x), «++, Pa(ai,.-.,2n)) = 4, 
for all x in M and for? = 1,...,%. 


For example, if x = <y? + 93, y?} + 43>, then at the point b = <1, 2> 
we have 


O(21, 42) = det Ee a 


8(y1, ¥2) 241, 2Y¥2$]<1.2> 
ae] ae 
= det 3 ‘| = —12 ~90, 


and we therefore know without trying to solve explicilly that there is a unique 
solution for y in terms of x near x = <1° + 23,124 27> = <9,5>. The 
reader would find it virtually impossible to solve for y, since he would quickly 
discover that he had to solve a polynomial equation of degree 6. This clearly 
shows the power of the theorem: we are guaranteed the existence of a mapping 
which may be very difficult if not impossible to find explicitly. (However, in 
the next chapter we shall discover an iterative procedure for approximating the 
inverse mapping as closely as we want.) 

Everything we have said here applies all the more to the implicit-function 
theorem, which we now state in Cartesian form. 


Theorem 11.5. Suppose that we are given m continuously differentiabir 
real-valued functions G(x, y) = Gil@1,...) fn Yi, +++) Ym) Of n+ m real 
variables defined on an open subset A x B of R™*™ and an (x + m)-tuple 
<a, b> = <a,..-,@n,01,--.,8m> such that G,(a,b) = 0 for 7: 


1,...,m, and such that the Jacobian determinant 
aGi, o 98 Gn) 
Pie a 
ay, “2 9 F Ym) ( ) 


is not zero. Then there is a bal! M about a in R” and a uniquely determined 
m-tuple F= <Fy,...,F > of continuously differentiable real-valued 
functions F(x) = Fj{z,,...,2n) defined on Af such that b = Ffa) and 
G,(x, F(x)) = 0 on M for? = 1,...,m. That is, b; = Fi(aq,.--, dn) for 
Bos dc ny HS A Oy i ys ay a Pt ony ee iw aD) 

0 for all x in Mf and for? = 1,..., m. 


For example, the equations 


3 3 2 2 
zi + 22 — yi — ye = 9, 
2 2 3 3 
vy — %e— Yi 2 = 


can be solved uniqucly for y in terms of x near <x,y> = <1,1,1,-1>, 


3.11 THE IMPLICIT-FUNCTION THEOREM 169 
because they hold af that point and because 


a(G1, G2} [= | r r 
sa TY = det : =6 = 
Bi, ye) Ot | —ay?, —3y2] — Save — yeu) 


has the value 12 there. Of course, we mean only that the solution functions 
exist, not that we can explicitly produce them. 


EXERCISES 


11.1 Show that <z,y>to <e* + ¢¥, et + e-¥> is locally invertible about any 
point <a, >, and compute the Jacobian matrix of the inverse map. 

11.2 Show that <u,v> RH <e"4+ e*, e¢ — e*> is locally invertible about any 
point <a, b> in R2, by computing the Jacobian matrix. In this case the whole map- 
ping is invertible, with an easily computed inverse. Make this caleulation, compute 
the Jacobian matrix of the inverse map, and verify that the two matrices are inverses 
at the appropriate points. 

11.3 Show that the mapping <z, y,z> > <sin z, cosy, e’> from R? to R? is 
locally invertible about <0, 7/2,0>. Show that 


<2,y,2> > <sin (x-+ y+ 2), cos (2 — y+ 2), e@t9- > 


is locally invertible about <7/4, —1/4,0>. 

11.4 Express the second map of the above exercise as the composition of two maps, 
and obtain your answer a second way. 

115 Let F:<x,y>t+ <u,v> be the mapping from R? to R? defined by u = 
g2+y2n = 2ry. Compute an inverse G of F, being careful to give the domain and 
range of G. How many inverse mappings are there? Compute the Jacobian matrices 
of F at <1,2> and of G at <5, 4>, and show by multiplying them that they are 
inverse. 

11.6 Consider now the mapping F: <2, y>r> <3, y3>. Show that dF<o,9> is 
singular and yet that the mapping has an inverse G. What conclusion do we draw 
about the differentiability of G at the origin? 

11.7 Define FP: R2 — R2 by <2,¥>rh <e* cosy, e7siny>. Prove that F is 
locally invertible about every point. 

11.8 Define F:R? — R? by x y where 


vi =m tee+@s—1)4, ye = ei teet (3 —3es), ys = ti tad t as. 


Prove that xt> y = F(x) is locally invertible about x = <0,0,1>. 


11.9 For aifunction f: R > R the proof of local invertibility around a point @ where 
df, is nonsingular is much simpler than the general case, Show first that the Jacobian 
matrix of f at ais the number f’(a). We are therefore assuming that f(x) is continuous 
in a neighborhood of @ and that f’(a) + 0. Prove that then f is strictly increasing (or 
decreasing) in an interval about a. Now finish the theorem. (See Exercise 1.12.) 


170 THE DIFFERENTIAL CALCULUS 3.11 


11.10 Show that the equations 

P++ +3 = 0, t+ a+ y?4 27 = 4, I4-c+yt2=0 
have differentiable solutions z{é), y(é), z(t) around <t,2,4,2> = <0, —1,1,0>. 
11.11 Show that the equations 


ett ot 4 4 cM = 4, igi etCsS4 
can be uniquely solved for u and v in terms of z and y around the point <0, 0, 0,0>. 
11.12 Let S be the graph of the equation 


zz + sin (ay} + cos (az) = 1 


in R®, Determine whether in the neighborhood of (0, 1, 1) S is the graph of a differ- 
entiable function in any of the following forms: 
z=fy), 2=9(y,2),  y = A(a,z). 
11.13 Given functions f and g from R® to R such that f(a, 8, c) = Oand g(a, b,c) = 0, 
write down the condition on the partial derivatives of f and g that guarantees thr 
existence of a unique pair of differentiable functions y = A(x) and z = &(z) satisfying 
h(a) = b, k(a) = ¢; 
and 
Ste, y, 2) = f(z, h(x), k(2)) = 0, 
g(a, y,2) = g(x, A(z}, k(z)) = 0 around = <a,b,e>. 
11.14 Let G(£, 9, £) be a continuously differentiable mapping from V = []? V; to WV 
such that. dG: V3 — Wis invertible and G{a) = Gla, a2, a3) = 0. Prove that there 
exists a uniquely determined function ¢ = F(t, 4) defined around <qa),a2> in 
Vi X Vo such that G(é, 9, F(é, »)) = 0 and F(a, a) = a3. Also show that 


dP egy = (dG. ety) MG ener), 
where € = F(£, 9). 


1.15 Let F(£, 9) be a continuously differentiable function from V X W to X, and 
suppose that dPXq,> is invertible. Setting Y = F(a, 9), show that there is a product 
neighborhood LX MX N of <Y,a,8> in X¥ X VX W and a unique continuously 
differentiable mapping G:L xX M-—»N such that on LX M, F(&, GUE, 8)) = ¢ 


11.16 Suppose that the equation g(x, y, z) = Ocan be solved for z in terms of x and . 
This means that there is a function f(z, y) sueh that g(x, y, f(z, y)) = 0. Suppose alsu 
that everything is differentiable and compute @2/éz. 


11,17 Suppose that the equations 
g(z,¥,2) = 0 and hia, y,z) = 0 


can be solved for y and z as functions of x, Compute dy/dz. 

11,18 Suppose that g(x, y, u,e) = 0 and A(z, y, u,v) = 0 can be solved for u and er 
as functions of x and y. Compute du/dz. 

11.19 Compute dz/dx where x3 + y® + 2 = Oand ¢?+ y?-1 2% = 1, 

12.20 If 4+ 23 4- y3 4-23 = Oand #2 4- x? 4- y? 4-2? = 1, then 0z/dz is ambiguous. 
We are obviously going to think of two of the variables as functions of the other twe. 


3.11 THE IMPLICIT-FUNCTION THEOREM 171 


Also z is going to be dependent and a independent, But is? or y going to be the other 
independent variable? Compute @2/8x under each of these assumptions. 

11.21 We are given four “physical variables” p, v, ¢, and ¢ such that each of them isa 
function of any two of the other three. Show that d¢/dp has two quite diffcrent mcean- 
ings, and make explicit what the relationship between them is by labeling the various 
functions that are relevant and applying the implicit differentiation process. 


11.22 Again the “one-dimensional” case is substantially simpler. Let G be a con- 
tinuously differentiable mapping from R2 to R such that Gla, b) = 0 and 


(8G/dy) (a,b) = Gela, 6) > 0. 
Show that there are positive numbers ¢ and 8 such that for each ¢ in (@ — 6,a-- 4) 
the funetion g(y} = Ge, y) is strictly increasing on (6 — ¢,b + ¢€] and Gle,b — €) < 
0 < Gle, b+ «). Conclude from the intermediate-value theorem (Exercise 1.13) that 
there exists a unique function F:(@ — 6,a+ 6) > (8 — ¢,6-+ €) such that 
G(x, F(z)) = 0. 
11.23 By applying the same argument used in the above exercise a second time, prove 
that F is continuous. 
11.24 In the inverse-function theorem show that dF, = (dH,)—). That is, the differ- 
ential of the inverse of H is the inverse of the differential of H. Show this 
a) by applying the implicit-function theorem; 
b) by a direct calculation from the identity H(F() = &. 
1L25 Again in the context of the inverse-mapping theorem, show that there is a 
neighborhood M of 8 in A such that F(H(4)) = 4 on Af. (Don’t work at this. Just 
apply the theorem again.) 
11.26 We continue in the context of the inverse-mapping theorem. Assume the result 
(from the next chapter) that if dH" exists, then so does dH", for ¢ sufficiently close 
to 8. Show that there is an open neighborhood U of 8 in B such that Z is injective on U, 
H{U] is an open set V in V, and H— is continuously differentiable on N. 
11.27 Use Excreise 3.21 to give a dircet proof of the existence of a Lipschitz con- 
tinuous local inverse in the context of the inverse-mapping theorem. [Hint: Apply 
Theorem 7.4.} 
11,28 A direct proof of the differentiability of an inverse function is simpler than the 
implicit-function theorem proof. Work out such a proof, modeling your arguments in a 
general way upon those in Theorem 11.1. 


11.29 Prove that the implicit-funetion theorem ean be deduced from the inverse- 
funetion theorem as follows. Set 


H(E,n) = <&, G(8, 9) >, 
and show that d//<.,8> has the block diagram 
I 0 


dG! | dG? 


Then show that dH 24,85 ~! exists from the block diagram results of Chapter 1. Apply 
the inverse-mapping theorem. 


172 THE DIFFERENTIAL CALCULUS 3.12 


12. SUBMANIFOLDS AND LAGRANGE MULTIPLIERS 


If V and W are finite-dimensional spaces, with dimensions x and m, respectively, 
and if F is a continuous mapping from an open subset A of V to W, then (the 
graph of) F is a subset of V x W which we visualize as a kind of “n-dimensional 
surface” S spread out over A. (See Section 10.) We shall call F an n-dimensional 
patch in V X W. More generally, if X is any (n + m)-dimensional vector space, 
we shall call a subset S an n-dimensional patch if there is an isomorphism ¢ 
from X to a product space V x W such that V is 2-dimensional and ¢[S] is a 
patch in ¥V x W. That is, S becomes a patch in the above sense when X is 
considered to be ¥ X W. This means that, if 2, is the projection of X = V x W 
onto V, then 2 ,[S] is an open subset. A of V, and the restriction 7, [ S is one-to- 
one and has a continuous inverse. If > is the projection on W, then F = 
Wo (7, | S)—7 is the map from A to W whose graph in Y xX W is S (when 
Y x W is identified with X). 

Now there are important surfaces that aren’t such “patch” surfaces. Con- 
sider, for instance, the surface of the unit ball in R*, S= {x: 3} 2? = 1}. Sis 
obviously a two-dimensional surface in R® which cannot be expressed as a graph, 
no matter how we try to express R*® as a direct. sum. However, it should be 
equally clear that S zs the union of overlapping surface patches. Hf @ is any point 
on S, then any sufficiently small neighborhood N of a in R? will intersect S in a 
patch; we take V as the subspace parallel to the tangent plane at a and W as 
the perpendicular line through 0. Moreover, this property of S is a completely 
adequate definition of what we mean by a submanifold. 

A subset S of an (2 + m)-dimensional vector space X is an n-dimensional 
submanifold of X if each a on S has a neighborhood NV in X whose intersection 
with S is an n-dimensional patch. 

We say that S is smooth if all these patches S, are smooth, that is, if the 
function F: A — W whose graph in V x W is the patch S, (when X is viewed 
as V X W) is continuously differentiable for every such patch S,. 

The sphere we considered above is a two-dimensional smooth submanifold 
of R?. 

Submanifolds are frequently presented as zero sets of mappings. For 
example, our sphere above is the zero set. of the mapping G@ from R?® to R defined 
by G(x) = ©32? — 1. It is obviously important to have a condition guar- 
anteeing that such a null set is a submanifold. 


Theorem 12.1. Let Gbea continuously differentiable mapping from an open 
subset U of an {n+ m)-dimensional vector space X to an m-dimensional 
vector space Y such that dG, is surjective for every « in the zero set S of G. 
Then S is an n-dimensional submanifold of X. 


Proof. Choose any point Y of 8S. Since d@y is surjective from the (m + m)- 
dimensional vector space X to the m-dimensionai vector space Y, we know that 
the null space V of dG, has dimension n (Theorem 2.4, Chapter 2). Let W be any 


3.12 SUBMANIFOLDS AND LAGRANGE MULTIPLIERS 173 


complement of V, and think of X as V x W, so that G now becomes a function of 
two vector variables and Y is a point <a,@> such that G(a, 8) = 0. The 
restriction of dG<..3> to W is an isomorphism from W to Y; that is, (dG2.,3>)7} 
exists. Therefore, by the implicit-function theorem, there is a product neigh- 
borhood S;(a) x S,(8) of <a, 8> in X whose intersection with S is the graph 
of a function on S;(a). This proves our theorem. U 


If S is a smooth submanifold, then the function F whose graph is the patch 
of S around ¥ (when X is viewed suitably as V x W) is continuously differentia- 
ble, and therefore S has a uniquely determined n-dimensional tangent plane Jf 
at Y that fits S most closely around ¥ in the sense of our o-approximations. 
If y = 0, this tangent plane is an n-dimensional subspace, and in general it is 
the translate through ¥ of a subspace N. We call V the tangent space of S at 7; 
its elements are exactly the vectors in X tangent to parametrized ares drawn 
in S through ¥. What we are going to do later is to describe an n-dimensional 
manifold S independently of any imbedding of S in a vector space. The tangent 
space to S at a point ¥ will still be an invaluable notion, but we are not going to 
he able to visualize it by an actual tangent plane in a space X carrying S. 
Instead, we will have to construct the vector space tangent to S at Y some- 
how. 

The clue is provided by Theorem 10.2, which tells us that if S is imbedded 
as a submanifold in a vector space X, then each vector tangent to S at ¥ can be 
presented as the unique tangent vector at Y to some smooth curve lying in 8. 
This mapping from the set of smooth curves in S through ¥ to the tangent space 
at ¥ is not injective; clearly, different curves can be tangent to each other at ¥ 
nid so have the same tangent vector there. Therefore, the object in S that 
corresponds to a tangent vector at 7 is an equivalence class of smooth curves 
through ¥, and this will in fact be our definition of a tangent vector for a general 
manifold. 

The notion of a submanifold allows us to consider in an elegant way a 
classical “eonstrained” maximum problem. We are given an open subset U 
of a finite-dimensional vector space X, a differentiable real-valued function ¥ 
iefined on U7, and a submanifold S lying in U7. We shall suppose that the 
submanifold S is the zero set. of a continuously differentiable mapping G from U/ 
to a veetor space Y such that dGy is surjective for each Y on S. We wish to 
consider the problem of maximizing (or minimizing) F(Y) when ¥ is “con- 
xLrained” to lie on S. We cannot expect to find such a maximum point Yo by 
setting dF, = 0 and solving for Y, because ¥q will not be a critical point for F. 
Consider, for example, the function g(x) = 0? 2? — 1 from R? to R and F(x) = 
ty. Here the “surface” defined by g = 0 is the unit sphere 37/2? = 1, and on 
this sphere ¥ has its maximum value 1 at <0,1,0>. But F is linear, and so 
diy = F ean never be the zero transformation. The device known as Lagrange 
inultipliers shows that we can nevertheless find such constrained critical points 
by solving diy = 0 for a suitable function Z. 


174 THE DIFFERENTIAL CALCULUS 3.12 


Theorem 12.2. Suppose that # has a maximum value on S at the point 7. 
Then there is a functional Jin ¥* such that ¥ is a critical point of the func- 
tion F — (lo G). 


Proof. By the implicit-function theorem we can express X as V X W in such a 
way that the neighborhood of S around Y is the graph of a mapping H from an 
open set A in V to W. Thus, expressing F and G as functions on V X W, we 
have G(, 7} = OnearY = <a, 8> if and only if 7 = H(£), and the restriction 
of F(£, ») to this zero surface is thus the function K: A — R defined by K(é) = 
F(, H(£)). By assumption « is a critical point for this funetion. Thus 


0 = dKe = dF kagy> + dFhay © dHa. 
Also from the identity G(¢, H(¢}) = 0, we get 
0 = d@has> + d@za,p> ° dHy 


Since d@Z_,3> is invertible, we can solve the second equation for dH, and 
substitute in the first, thus getting, dropping the subscripts for simplicity, 


dF! — dF? o (dG?)—! 0 dG = 0. 


Let ! © ¥* be the functional dF? o (dG?)—!. Then we have dF! = 1 o dG’ and, 
by definition, dF? = 1 o dG®. Composing the first equation (on the right) with 
mW1:V X W- YV and the second with m2, and adding, we get dF cap» = 
lo d@eaa>. That is, dF —le Gy = 0. U 


Nothing we have said so far explains the phrase “Lagrange multipliers”. 
This comes out of the Cartesian expression of the theorem, where we have U 
an open subset of a Cartesian space R*, Y = R”, G= <qg',...,9">,andZin 
Y* of the form i: Uy) = Etey,. Then F—1oG = FP — Seg’, and 
d(F — io G), = 0 becomes 


ix; = ay = ‘ j=1,...,%. 
These equations together with the m equations G = <g!,...,g"> = Ogive 
m+ » equations in the m-+ unknowns 2, ..., 2p, C1)... Cm- 

Our original trivial example will show how this works out in practice. We 
want to maximize F(x) = zo from R* to R subject to the constraint 3 x? = 1. 
Here g(x) = +3 2? — 1 is also from R* to R, and our method tells us to Iook 
for a critical point of F — eg subject to g = 0. Our system of equations is 


0 — 2cx, = 0, 
1 — 2cre = 0, 
0 — 2cxg = 0, 


3 
ae = 1 
1 


3.18 FUNCTIONAL DEPENDENCE 175 


The first says that ¢ = 0 or x; = 0, and the second implies that c cannot be 0. 
Therefore, 71 = x3; = 0, and the fourth equation then shows that x2 = +1. 

Another example is our problem of minimizing the surface area A = 
2(cy + yz + 2x) of a rectangular parallelepiped, subject to the constraint of a 
constant volume, yz = V. The theorem says that the minimum point will be a 
critical point of A — AV for some A, and, setting the differential of this function 
equal to zero, we get the equations 


2(y + z) — dAyz = 0, 
2(¢ + 2) — Awz = 0, 
2r-+y) — dey = 0, 


together with the constraint 
aye = V. 


The first three equations imply that « = y = z; the last then gives V'"* at the 
common value. 


*13. FUNCTIONAL DEPENDENCE 


The question, roughly, is this: If we are given a collection of continuous functions, 
all defined on some open set A, how can we tell whether or not some of them are 
functions of the rest? For example, if we are given three real-valued continuous 
functions f1, fo, and fs, how can we tell whether or not some one of them is 4 
function of the other two, say fz is a function of f; and fo, which means that there 
is a function of two variables g(x, y) such that f3(t) = g{fi(t), fo(é)) for all ¢ in 
the common domain A? If this happens, we say that f; is functionally dependent 
on f, and fg. This is very nearly the same as asking when it will be the case that 
the range S of the mapping P:t> <fi(t), fo(t), fs(> is a two-dimensional 
submanifold of R*. However, there are differences in these questions that are 
worth noting. If fs is functionally dependent on f; and fo, then the range of F 
certainly lies on a two-dimensional submanifold of R*, namely, the graph of g. 
But this is no guarantee that it itself forms a two-dimensional submanifold. 
For example, both fo and f; might be functionally dependent on f,, fe = g Ji, 
and f; = fe f,, in which case the range of F lies on the curve <s, g(s), R(s)> in 
f3, which is a one-dimensional submanifold. In the opposite direction, the range 
of F can be a two-dimensional submanifold Af without fs being functionally 
dependent on fz and f;. All we can conclude in this case is that locally one of the 
functions {f,}? is a function of the other two, since locally M is a surface patch, 
in the language of the last section. But if we move a little bit away on the 
curving surface Af to the neighborhood of another point, we may have to solve 
for a different one of the functions. Nevertheless, if M = range F is a subset of 
a two-dimensional manifold, it is reasonable to say that the functions {f,}7 are 
functionally dependent, and we are led to examine this more natural notion. 


176 THE DIFFERENTIAL CALCULUS 3.13 


If we assume that F = <f), fe, f/3> 1s continuously differentiable and that 
the rank of dF, is 3 at some point a in A, then the implicit-function theorem 
implies that F[A] includes a whole ball in R* about the point F(a). Thus a 
necessary condition for Af = range F to lie on 2 two-dimensional submanifold in 
R? js that the rank of dF, be everywhere less than 3. We shall see, in fact, that 
if the rank of dF, is 2 for all a, then Af = range F is essentially a two-dimensional 
manifold. (There is still a tiny difficulty that we shall explain later.} Our tools 
are going to be the implicit-function theorem and the following theorem, which 
could well have come much earlier, that the rank of T is a “lower semicon- 
tinuous” function of 7. 


Theorem 13.1, Let ¥ and W be finite-dimensional vector spaces, normed 
in some way. Then for any T in Wom(¥, W) there is an ¢€ such that 


|S — Til < € = rank S > rank T. 


Proof. Let T have null space N and range R, and Jet X be any complement of 
N in V. Then the restriction of T to X is an isomorphism to R, and hence is 
bounded below by some positive m. (Its inverse from # to X is bounded by 
some 6, by Theorem 4.2, and we set m = 1/6.) Then if ||S — T|| < m/2, it 
follows that S is bounded below on X by m/2, for the inequalities 


(Pell 2 mile] and —||S — T)fe)|| S (n/2)Ilal 


together imply that |}S(a@)j] > (m/2)llel|. In particular, S is injective on X, and 
so rank S = d(range 8S) > d(X) = d(R) = rank T. 0 


We can now prove the general local theorem, 


Theorem 23,2. Let ¥ and W be finite-dimensional spaces, let r be an integer 
less than the dimension of W, and let F be 4 continuously differentiable map 
from an open subset A C V to W such that the rank of dF, = ¢ for all ¥ 
in A. Then each point 7 in A has a neighborhood U such that F[U] is an 
dimensional patch submanifold of W. 


Proof. For a fixed ¥ in A let V, and Y be the null space and range of dF y, let 
Ve be a complement of V; in VY, and view V as V; x V2. Then F becomes 
a function F(é, 9) of two variables, and if Y = <a, >, then dF%. gy is an 
isomorphism from V2 to Y. At this point we can already choose the decom- 
position W = W, @ We with respect to which F[A] is going to be a graph 
(locally). We simply choose any direct sum decomposition W = W, © We 
such that We is a complement of Y = range dF <2 >. Thus W, might be Y, 
but it doesn’t have to be. Let P be the projection of W onto W, along Wo. 
Since Y is a complement of the null space of P, we know that P [ Y is an 
isomorphism from Y to W,. In particular, W, is r-dimensional, and 


rank Po dF eg g> = 1: 


3.13 FUNCTIONAL DEPENDENCE 177 


Moreover, and this is crucial, ? is an isomorphism from the range of 
aF gt.n> to W, for all < , 47> sufficiently close to <a, 8>. For the above rank 
theorem implies that rank PodPe:,> > rank PodP eg a> =r on some 
neighborhood of <a,8>. On the other hand, the range of Po dF ery is 
included in the range of P, which is W,, and so rank Pe dF g;45 <r. Thus 
rank Po dF 27,5 = 7 for <& 47> near <a, 38>, and since rank dF e:4> = 7 
by hypothesis, we see that P is an isomorphisin on the range of any such dF <4. 

Now define H: W, X A — W) as the mapping 


<6, 97> +> Po F(E, 4) — &. 


If « = Po F(a, 8), then dHiya.g> = Po dF%agy, which is an isomor- 
phism from V2 to W,. Therefore, by the implicit-function theorem there exists 
a neighborhood LX Af x N of <p,a,8> and a uniquely determined con- 
tinuously differentiable mapping G from L x M to N such that 


H(y, & Ge, 9) = 0 


¢= Po F(E, Gy, &) 


on L x AM. That is, 


on Lx M. 

The remainder of our argument consists in showing that F(é, G(f, &)) isa 
funetion of f alone. We start by differentiating the above equation with respect 
to £, gelting 

0= Po UF! + dF? o dG?) = PodF o <I,dG’>. 
As noted above, P is an isomorphism on the range of dF <¢,,) for all <&, 94> 
sulliciently close to <a, 8>, and if we suppose that Z « Af is also taken small 
cnough so that this holds, then the above equation implies that 
OF gpq> 0 <1, d@’> =0 


for all <¢, > ELK M. But this is just the statement that the partial differ- 
ential with respect to ¢ of F(£, Ge, £)) is identically 0, and hence that 
F(é, G(s, 8) 


is a continuously differentiable function K of ¢ alone: 


F(é, Go, 8) = K(). 
Since 7 = G(g, £) and ¢ = Po F(£, 9), we thus have F'(é, 9) = K(P © F(&, 9)), 


or 


F=KoPoF, 


and this holds on the open set U' consisting of those points <¢, 7> in Wf x N 
such that Po F(é,,) eb. If we think of W as W, <x Wo, then F and K 
are ordered pairs of functions, F = <F!,F?> and K = <l,k>,P is the 
mapping <¢,v> ++ ¢, and the second component of the above equation is 


F? = ko F'!, 


178 THE DIFFERENTIAL CALCULUS 3.13 


Since F4U] = P o F[U] = L, the above equation says that F[U] is the graph of 
the mapping k from ZL to We. Moreover, L is an open subset of the r-dimensional 
vector space W , and therefore F[U] is an dimensional patch manifold in 
W= W,xX We. U 


The above theorem includes the answer to our original question about 
functional dependence. 


Corollary. Let F = {f*}? be an m-tuple of continuously differentiable 
real-valued functions defined on an open subset A of a normed linear space 
VY, and suppose that the rank of dF’, has the constant value r on A, where r 
is less than m. Then any point y in A has a neighborhood UY over which 
m— r of the functions are functionally dependent on the remaining r. 


Proof. By hypothesis the range Y of dF; = <df},..., 0/7 > is an r-dimen- 
sional subspace of R™. We can therefore find a basis for a complementary sub- 
space W. by choosing m — r of the standard basis elements {4°}, and we may 
as well renumber the functions f* so that these are a7t!,..., 8. Then the 
projection P of R® onto R’ = L(é',..., 67) is an isomorphism from Y to R’ 
{since Y is a complement of its null space), and by the theorem there is a neigh~ 
borhood U of ¥ over which (7 — P)o F is a function k of Po F. But this says 
exactly that <frt!,...,f"> = ko <fl,...,f7>. That is, kis an (m — r)- 
tuple-valued funetion, k= <k7t},...,4">, and jf’ = ko <f!,...,f"> for 
jorti,...,m 0 


¥ Fig. 3.12 


We mentioned earlier in the section that there was a difficulty in concluding 
that if F is @ continuously differentiable map from an open subset A of V to W 
whose differential has constant rank r less than d(W), then S = range F is an 
r-dimensional submanifold of S. The flaw can be described as follows. The 
definition of a submanifold S of X required that each point of § have a neighbor- 
hood in X whose intersection with S is a patch. In the case before us, what we 


3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 179 


can conclude is that if @ is a point of S, then 8 = F {a} for some a in A, and a 
has a neighborhood U whose image under F is a patch. But this image may not 
be a full neighborhood of 8 in S, because S may curve back on itself in such a 
way as to intrude into every neighborhood of 8. Consider, for example, the one- 
dimensional T imbedded in R* suggested by following Fig. 3.12. The curve 
begins in the xz-plane along the z-axis, curves over, and when it comes to the 
zy-plane it starts spiraling in to the origin in the zy-plane (the point of change 
over from the xz-plane to the xy-plane is a singularity that we could smooth out). 
The origin is not a point having a neighborhood in R* whose intersection with I 
is a one-patch, but the full curve is the image of (—1, 1} under a continuously 
differentiable injection. 

We would consider I to be a one-dimensional manifold without any difficulty, 
but something has gone wrong with its imbedding in R°, so it is not a one-dimen- 
sional submanifold of R3. 


#14. UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 


Tn the next chapter we shall see that a continuous function F whose domain is a 
bounded closed subset of a finite-dimensional vector space V is necessarily 
uniformly continuous. This means that given e, there is a 6 such that 


le — all < 8 = [lF(E) — P(n)l] < 


for all vectors ¢ and 7 in the domain of F. 

The point is that 6 depends only on ¢€ and not, as in ordinary continuity, 
on the “anchor” point at which continuity is being asserted. This is a very 
important property. In this section we shall see that it underlies a class of 
theorems in which a point map is escalated to a function-valued map, and prop- 
erties of the point map imply corresponding properties of the function-valued 
map. Such theorems have powerful applications, as we shall see in Section 15 
and in Section 1 of Chapter 6. An application that we shall get immediately here 
ix the theorem on differentiation under the integral sign. However, it is only 
Theorem 14.3 that will be used later in the book. 

Suppose first that F(£, ») is a bounded continuous function from a product 
upen set Af x N to a normed linear space X. Holding 7 fixed, we have a function 
£8 = F(é 4) which is a bounded continuous function on M, that is, an 
element of the normed linear space Y = ®e(M, X) of all bounded continuous 
maps from M to X. This function is also indicated F(-, 7), so that f, = F(-, 7). 
We are supposing that the uniform norm is being used on Y; 


fall = lub {/|f,(8)l] + € € 30} = lub {[|PCE, wll: & e Mh. 


Theorem 14,1. In the above context, if F is uniformly continuous, then the 
mapping 7 + f, (or 7 +> F(-, 9)) is continuous, in fact, uniformly continuous, 
from N to Y,. 


180 THE DIFFERENTIAL CALCULUS 3.14 


Proof. Given ¢, choose 6 so that 
|< a> — <u, yr] <6 = |/F(E 9) — Pla vk <e. 
Taking « = ¢ and rewriting the right-hand side, we have 


lin - vl <6 => WA) — Fl < 
for all & Thus 
ln —vl <6 = |, -Sfl. Se G 


We have proved that if a function of two variables is uniformly continuous, 
then the mappings obtained from it by the general duality principle are con- 
tinuous. This phenomenon lies behind many well-known facts. Tor example: 


Corollary. If F(z, y) is a uniformly continuous real-valued function on the 
unit square (0, 1] x [0, 1] in R?, then f¢ F(z, y) dx is a continuous function 
of y. 
Proof. The mapping yr fd Fiz, y) dy is the composition of the bounded 
Hnear mapping fr Af f from @([0,1] to R with the continuous mapping 
y— F(-, y) from [0, 1] to ©[0, 1], and is continuous as the composition of con- 
tinuous mappings. 0 


We consider next the differentiability of the above duality-induced mapping. 


Theorem 14.2. If # is a bounded continuous mapping from an open product 
set AI x N of a normed linear space V X W to a normed linear space X, 
and if d/'2,...g5 exists and is a bounded uniformly continuous function of 
<a,8> on MN, then ¢: 7 — F(-, 7) is 4 diiferentiable mapping from 
N to ¥Y = Be(M, X), and [dep(n) I(t) = dF Ze.2>-(n). 


Proof. Given «, we choose & by the uniform continuity of d¥*, so that 
ie — vl <8 = [dF coup — dF Zey>|| < € 


for all £ € AY. The corollary to Theorem 7.4 then implies that 


lal] < 6 = ||APZe9>() — dFZe,8>(y)|! S ela 


for all ¢ € M, all 8 EN, and all q such that the line segment from 6 to 8+ 7 
isin N. We fix 8 and rewrite the right-hand side of the above inequality. This 
is the heart of the proof. First 


AF 2: g>(n) = F(E, 8 + 2) — FE, B) 
= [fs4_ — fale) = [e(8 + 9) — o(8)E) = [Aga(a)]¢ 8). 
Next we can check that if ||d¥F%,.,>|| <b for <p, v> € M & N, then the map- 


ping 7 defined by the formula [T(»)]() = d¥%.g>() is an element of 
Hom(W, Y) of norm at most 6. We leave the detailed verification of this as an 


3.14 UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 181 


exercise for the reader. The last displayed inequality now takes the form 


ila] < 8 = |ildee(a) — TO)KOIM < ellall, 
and hence 
ln] < 5 => [ldgs(n) — T(m)]le < €llall. 


This says exactly that the mapping ¢ is differentiable at 6 and dgg = T. U 


The mapping ¢ is in fact continuously differentiable, as can be seen by 
arguing a little further in the above manner. The situation is very close to being 
an application of Theorem 14.1. 

The classical theorem on differentiability under the integral sign is a corollary 
of the above theorem. We give a simple case. Note that if y is a real variable y, 
then the above formula for dg can be rewritten in terms of arc derivatives: 


/ pink 
PONE) = 5 (é, b). 


Corollary. If F(x, y) is a continuous real-valued function on the unit 
square (0, 1] x [0,1], and if aF/dy exists and is a uniformly continuous 
function on the square, then Te F(z, y) dz is a differentiable function of y 
and its derivative is Se (OF /ay)(x, y) dz. 


Proof. The mapping T: y> Se F(x, y} dz is the composition of the bounded 
linear mapping f > fe S(z) dz from @({(0, 1]} to R with the differentiable mapping 
g:y F(-,¥) from (0, 1] to €((0, 1), and is therefore differentiable by the 
composite-function rule. Then Theorem 7.2 and the fact that the differential of 
a bounded linear map is itself give 


I 1 
rw) = | wona)ds= {| Xe, rae. 0 
0 o oY 


We come now to the situation of most importance to us, where a point to 
point map generates a function-to-function map by composition, Let A be an 
open set in a normed linear space Y, let S be an arbitrary set, and let @ be the 
set of bounded mapsf from S to A. Then @ is a subset of the normed linear space 
&(S, V) of all bounded functions from S to V under the uniform norm. A fune- 
tion f € @ will be an interior point of @ if and only if the distance from the range 
of f to the boundary of @ is a positive number 4, for this is clearly equivalent to 
saying that @ includes a ball in @(S8, V) about the point f. Now let g be any 
bounded mapping from A to a normed linear space W, and let G: @ — @(S, W) 
be composition by g. That is, A = G(f) if and only if fe @ andhkh = gof. We 
can consider both the continuity and differentiability of @, but we shall only 
work out the differentiability theorem. 


Theorem 14.3. Let the function g: A — W be differentiable at each point 
o in A, and let dg, be a bounded uniformly continuous function of « Then 
the mapping G:@— &(S, W) defined by G(f) = gof is differentiable at 


182 THE DIFFERENTIAL CALCULUS 3.15 


any interior point f in @ and dG,: ®(S, V) > &(S, W) is defined by 


[2G (A) (8) = doriey(h(s)) 
for all s € S. 


Proof. Given ¢€, choose 6 by the uniform continuity of dg so that. 
lia — BI} < 8 => jldga — dggll < €, 
and then apply the corollary to Theorem 7.4 once more to conclude that 
Eli < 8 => fagale) ~ dga(S i S elléll, 


provided the line segment from a to a + &is in A. Now choose any fixed interior 
point f iu @, and choose 6° < 8 so that By(f) C @. Then for any & in @(S, V), 


W[Rlle < 8° => [Ages (R(s)) — darcay(h(s)) | < ellA(s)|i 
for alls € S. Define a map T: @(S, V) & @(S, W) by [T(A)](s) = daycay(h{s)). 
Then the above displayed inequality can be rewritten as 
WJAllo < 8° => |lAG(A) — TA) le S €llhllao- 


That is, AG; = T'-++ 0. We will therefore be done when we have shown that 
T € Hom(@(S, V), &6S, W)). 
First, we have 


(P (hy + hod) {s) 


gy¢s) (Car + ha)(s)) = dgr¢s)(hi(s) + ha(s)) 

dgy¢s)(ha(s)) + dgpcsy(ho(s)) = (Tei) (8) + (P(h2))(s). 
Thus (hy + he) = Th.) + Tho), and homogeneity follows similarly. Second, 
if b is a bound to |jdgaf} on A, then {{7(h)|l. = lub {|/(7(A))(s)||:3 ES} < 
lub {{ldgscay lf - |[A(s) |] 1s ES} < Hilal... Therefore, |(7[| <5, and we are 
finished. U 


In the above situation, if gis from A x U to W, so that G(f) is the function 
h given by k(é) = g(f(t), £), then nothing is changed except that the theorem is 
about dg instead of dg. If, in addition, V is a product space V; X Vo, so that f 
is of the form <f,, fe> and [G()1@) = g(f1(, fe(é), O, then our rules about 
partial differentials give us the formula 


[EHO = agp (iO) + dof (he). 


*15. THE CALCULUS OF VARIATIONS 


The problems of the caiculus of variations are simply critical-point problems of a 
certain type with a characteristic twist in the way the condition dF, = 0 is used. 
We shal] illustrate the subject by proving one of its standard theorems. 

Since we want to solve a constrained maximum problem in which the domain 
is an infinite-dimensional vector space, a systematic discussion would start off 


3.15 THE CALCULUS OF VARIATIONS 183 


with a more general form of the Lagrange multiplier theorem. However, for our 
purpose it is sufficient to note that if S is a closed plane 4 + a, then the restric- 
tion of F to S is equivalent to a new function on the vector space Af, and its 
differential at 8 = 4 + @ in S is clearly just the restriction of d&g to M. The 
requirement that 8 be a critical point for the constrained function is therefore 
simply the requirement that dF’, vanish on Af. 

Let F be a uniformly continuous differentiable real-valued function of three 
variables defined on (an open subset of) Wx W x R, where W is a normed 
linear space. Given a closed interval [a, 6] C R, let V be the normed linear space 
e\({a, b], W) of smooth ares f: [a,b] — W, with ||f|| taken as |lflle + fle 
The problem is to maximize the (nonlinear) functional G(f) = fe Ff), f'O,0 dt, 
subject to the restraints f(a) = « and f(6) = 8. That is, we consider only 
smooth ares in W with fixed endpoints a and §, and we want to find that arc 
from @ to 8 which maximizes (or minimizes) the integral. Now we can show 
that G is a continuously differentiable function from (an open subset of) V to R. 
The easiest way to do this is to let XY be the space @({a, 6], W} of continuous ares 
under the uniform norm, and to consider first the more general functional K 
from X x X to R defined by K(f, 9) = [2 F(FQ), g(t), ) dt. By Theorem 14.3 
the integrand map </f,g> - F(f(-), y(), +) is differentiable from X x X to 
@([a, 5]) and its differential at </,g> evaluated at <h, k> is the function 


AP Lye gtty.t> (RQ) + AP erry etre kD. 


Since f > [2 f( is a bounded linear functional on @, it is differentiable and equal 
to its differential. The composite-function rule therefore implies that K is 
differentiable and that 


AK <pgr(h, k) = f° (aP*(MO) + aF*(K(O) a 


where the partial differentials in the integrand are at the point </f(#), g(6), t>. 
Now the pairs <f,¢> such that’ exists and equals g form a closed subspace of 
X xX X which is isomorphic to V. It is obvious that they form a subspace, but 
to see that it is closed requires the theory of the integral for parametrized ares 
from Chapter 4, for it depends on the representation f(t) = f(a) — Ji f’(s) ds 
and the consequent norm inequality |//() — f(a)|| < (@ — a)ilf'll.. Assuming 
this, we see that our original functional G is just the restriction of K to this sub- 
space (isomorphic to) V, and hence is differentiable with 


aGy(a) = f ak '(h() + aP?(W(Q) at. 


This differential dG, is called the first varzation of G about f. 

The fixed endpoints «a and 6 for the are f determine in turn a closed plane P 
in V, for the evaluation maps (coordinate projections) wz: f - f(z) are bounded 
and P is the intersection of the hyperplanes 7, = a and 7, = 8. Since Pisa 
translate of the subspace Mf = {f & V: f(a) = f(b) = 0}, our constrained 


184 THE DIFFERENTIAL CALCULUS 3.15 


Maximum equation is 


dGy(h) = f ” [dF (h(Q) + aF2(R'O)] dt = 0 
a 

for alk A in M. 

We come now to the special trick of the calculus of variations, called the 
lemma of Du Bois-Reymond. 

Suppose for simplicity that W = R. Then F is a function F(z, y, £) of three 
real variables, the partial differentials are equivalent to ordinary partial deriva- 
tives, and our critical-point equation is 


fi 
dG,(h) -{ (E-n+ ew) ao 


If we integrate the first term in the integral by parts and remember that h(a) = 
h(b) = 0, we see that the equation becomes 


b 
aF OF\ 
i ar _ foe g= 9, 


where g = A’. Since 4 is an arbitrary continuously differentiable function except 
for the constraints h(a) = h(b) = 0, we see that g is an arbitrary continuous 
function except for the constraint [2 g(t) dt = 0. That is, aF/ay — faF/dar is 
orthogonal to the null space NV of the linear functional g > [2 g(8 dé. Since the 
one-dimensional space N~ is clearly the set of constant functions, 


b b 
our condition becomes 


¢ 
oF . - aF ' 
ay GO. 1'O, 0) = f = (f(s), f(s), 8) ds + €. 


This equation implies, in particular, that the left member is differentiable. This 
is not immediately apparent, since f’ is only assumed to be continuous. Differ- 
entiating, we conclude finally that f is a critical point of the mapping G if and 
only if it is a solution of the differential equation 


POR hd OR os 


which is called the Huder equation of the variational problem. It is an ordinary 
differential equation for the unknown function f; when the indicated derivative 
is computed, it takes the form 
o°F a°F a°F oF 
SS et 
oy dy dx ay at ax 
If W is not R, we get exactly the same result from the general form of the 
integration by parts formula (using Theorem 6.3) and a more sophisticated 


3.15 THE CALCULUS OF VARIATIONS 185 


version of the above argument. (See Exercise 10.14 and 10.15 of Chapter 4.) 
That is, the smooth are f with fixed endpoints « and £ is a critical point of the 
mapping g++ J? F(g(?), g’(O), 4) dé if and only if it satisfies the Euler differential 
equation 


d ane 1 
at dF 42), 91(0),4> = AF <5 2y,t) t+ 


This is now a vector-valued equation, with values in W*, If W is finite-dimen- 
sional, with dimension , then a choice of basis makes W’* into R*, and this 
vector equation is equivalent to 7 scalar equations 


d oF 4 _ OF , 
Gh ay; JO L'O, 9 = 5 FO FO,9, 


where F is now a function of 27 + 1 real variables, 


Fix, y, t) = F(xi,...) 2a) Yt). +29 Un, O. 


Finally, let us see what happens to the simpler variational problem (W = R) 
when the endpoints of f are not fixed. Now the critical-point equation is dG@,(h) = 
0 for all h in V, and when we integrate by parts it becomes 


b 
oF OF 4d oF 
Foals f aw = ay 
for all k in V. We can reason essentially as above, but a little more closely, to 
conclude that a function f is a critica] point if and only if it satisfies the Euler 


equation 
a (tt) 2H 


and also the endpoint conditions 
ar 
oy 


This has been only a quick look at the variational calculus, and the interested 
reader can pursue it further in treatises devoted to the subject. There are many 
more questions of the general type we have considered. For example, we may 
want neither fixed nor completely free endpoints but freedom subject to con- 
straints. We shail take this up in Chapter 13 in the special case of the varia- 
tional equations of mechanics. Or again, f may be a function of two or more 
variables and the integral may be a multiple integral. In this case the Euler 
equation may become a system of partial differential equations in the unknown f. 
Finally, there is the question of sufficient conditions for the critical function to 
give-a maximum or minimum value to the integral. This will naturally involve a 
study of the second differential of the functional G, or its second variation, as it is 
known in this subject. 


= OF 
fma ay 


tecb 


186 THE DIFFERENTIAL CALCULUS 3.16 


*16. THE SECOND DIFFERENTIAL AND 
THE CLASSIFICATION OF CRITICAL POINTS 


Suppose that ¥ and W are normed linear spaces, that A is an open subset of ¥', 
and that F: A — W is a continuously differentiable mapping. The first differ- 
ential of F is the continuous mapping d?: ¥ + dFy from A to Hom(V, W). We 
now want to study the differentiability of this mapping at the point a. Pre- 
sumably, we know what it means to say that dF is differentiable at a. By 
definition d(dF), is a bounded linear transformation T from V to Hom(V, W) 
such that A{dF).(7) — T(m) = o(n). That is, dFy4, — dF. — T(n) is an 
element of Hom(V, W) of norm less than €||q|| for » sufficiently small. We set 
d°F., = d(dF),, and repeat: dF, = d?F,,(-) isa linear map from V to Hom(V, W), 
@F (yn) = d?F.(y)() is an element of Hom(V, W), and d?F..(y)(£) is a vector 
in W. Also, we know that @?F, is equivalent to a bounded bilinear map 
wi¥Vx VOW, 

where w(, £) = d?Fo(n)(2). 

The vector d??.(4)(£) clearly ought to be some kind of second derivative o! 
F al. a, and the reader might even conjecture that it is the mixed derivative in 
the directions & and 7. 

Theorem 16.1. If F: A — W is continuously differentiable, and if the 


second differential d?F,, exists, then for each fixed » © V the function 
DP: — D,F(y) from A to W is differentiable at a and D,{(D,F)(e) - 


(d7Fa(v)) (u). 


Proof. We use the evaluation-at-2 map ev,: Hom(V, W) — W defined for u 
fixed » in V by ev,(T) = T(u). It is a bounded linear mapping. Then 


(DP) (a) = dF a(n) = ev, (dF a) = (ev, ¢ dF) (a), 
so that the function D,F is the composition cv, ° dF, It is differentiable at « 


because d(dF), exists and ev, is linear. Thus (D,(D,F})(a) = d(D,F)a(v) = 
d(ov, o dP)q(v) = (ev, ° d(dF)a)(v) = evul(d’Fa))] = (d’Fav))(u). 0 


The reader must remember in going through the above argument that D,} 
is the function (D,F)(-), and he might prefer to use this notation, as follows 


D, (DFC) a = A(DP)C) ale) = dev, o dF .y)alv) 
= [evo ddF al”) = ev.{d?Fa(r)). 


Tf the domain space V is the Cartesian space R”, then the differentiability of 
(Dy F)(-) = (dF /dx,){-) at a implies the existence of the second partial deriva 
tives (07°F /dx; dz;)(a} by Theorem 9.2, and with b and ¢ fixed, we then have 


De(DyF) = De (= i 2f) = b:Dex 
ar 
=D (Cod )\ = Dis aay 


3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 187 


Thus, 


Corollary 1. If V = R* in the above theorem, then the existence of d?7/, 
implies the existence of all the second partial derivatives (6?F /dx; 02;)(a) 
and 


5 = _< OF 
d F,(b, c) = Dp(D.F)(a) = a b,c; ax; ax, (a). 


Moreover, from the above considerations and Theorem 9.3 we can also 
conclude that: 


Theorem 16.2. If V = R”, and if all the second partial derivatives 
(0°F /dx; 6x;)(a) exist and are continuous on the open set A, then the 
second differential d?F,, exists on A and is continuous. 


Proof. We have directly from Theorem 9.3 that each first partial derivative 
(6F /dx;)(-) is differentiable. But aF/dz; = evs ¢ dF’, and the corollary is then a 
consequence of the following general principle. £ 


Lemma. If {5;}* is a finite collection of linear maps on a vector space W 
such that S = <8),...,5;,> is invertible, then a mapping F: A — W is 
differentiable at @ if and only if S; o F is differentiable at @ for all 7. 


Proof. For then S o F and F = S~' o So F are differentiable, by Theorems 8.1 
and 6.2. 0 


These considerations clearly extend to any number of differentiations. Thus, 
if d?P...: y > d?F, is differentiable at a, then for fixed b and e the evaluation 
d?F’..,(b, ¢) is differentiable at a and the formula 


(De(DoP))() = PPh, a) = I bes go 5 O 


shows (for special choices of b and ¢) that all the second partials (°F /dx, dx,)(-) 
are differentiable at a, with 


(DaD-DiF)(a) = Da((DeDeF)O) |, = >» bods 5 oP (a). 


Conversely, if all the third partials exist and are continuous on A, then the 
second partials are differentiable on A by Theorem 9.3, and then d?F,. is 
differentiable by the lemma, since (07F /dx; 82;)(-) = evcais'y 0 dF.) 

As the reader will remember, it is crucially important in working with 
higher-order derivatives that 8°F/dx,; dx; = 64°F /dx,; 6x;, and we very much 
need the same theorem here. 


Theerem 16.3. The second differential is a symmetric function of its two 
arguments: (d?Fa(q))(£) = (d?Fa(£))(). 


188 THE DIFFERENTIAL CALCULUS 3.16 


Proof. By the definition of d(@F)., given ¢, there is a 5 such that 
|A(AP a(n) — AQF a(n)|] < ella] 


whenever ||q|| < 8 Of course, A(dF).(n) = dF a4, — dF. If we write down 
the same inequality with » replaced by » + ¢, then the difference of the trans- 
formations in the left members of the two inequalities is 


Fat, — 4Fa4n43) — PFe(—$), 
and the triangle inequality therefore implies that 
I(UPagn ~ Fategs) — OF a(—syil < e(llall + lla + oH) < 2e(Ilal] + Io), 


provided that both » and 4 + ¢ have norms at most 5. We shall take ||¢|| < 8/3 
and ||n|| < 26/3. If we hold ¢ fixed, and if we set T = d?F,(—t) and G(t) = 
F(t) — F(é + $), then this inequality becomes ||\dG.4, — Tl < 2e({falf + Heil), 
and since it holds whenever ||nj] < 26/3, we can apply the corollary to Theorem 
7.4 and conclude that ||AG@.+,(£) — T(é)|| < 2e(||-|| + ||¢l|){léll, provided that 9 
and q+ £ have norms at most 25/3. This inequality therefore holds if n, ¢, and £ 
all have norms at most 6/3. If we now set ¢ = —y, we have 


AGa ty = OPaty — AF atngs = APaty — AP a; 


and AG.4() = Flat nt § — Flat.) — Flat 8 + F(a). This function 
of 7 and ¢ is called the second difference of F at a, and is designated A°F4(n, &). 
Note thai it is symmetric in £ and y. Our final inequality can now be rewritten as 


||A7Fa(m, £) — (h?Fa(n)) (él < 4ellnll (4. 
Reversing » and ¢, and using the symmetry of A?F',, we see that 
|| (@?Fan))(&) — (€?Fa(8)) (il < 8elinll {l€ll, 


provided y and ~ have norms at most 6/3. But now it follows by the usual 
homogeneity argument, that this inequality holds for ali 7 and £. Finally, since 
¢ is arbitrary, the left-hand side is zero. 0 


The reader will remember from the elementary calculus that a critical 
point a for a function f [f’(a) = 0} is a relative extremum point if the second 
derivative f’’(a) exists and is not zero. In fact, if f’{a} < 0, then f has a relative 
maximum at a, because f(a} < 0 implies that f’ is decreasing in a neighborhood 
of a and the graph of / is therefore concave down in a neighborhood of a. Simi- 
larly, f has a relative minimum at a if f’(a) = O and f”(a) > 0. If f’({a} = 0, 
nothing can be concluded. 

If f is a real-valued function defined on an open set A in a finite-dimensional 
vector space V, if a & A is a critical point of f, and if d*fa exists and is a non- 


3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 189 


singular element of Hom(V, ¥*), then we can draw similar conclusions about the 
behavior of f near a, only now there is a richer variety of possibilities, The 
reader is probably already familiar with what happens for a function f from R? 
to R. Then a may be a relative maximum point (a “cap” point on the graph 
of f), a relative minimum point, or a seddle point as shown in Fig. 3.18 for the 
graph of the translated function Af,. However, it must be realized that new 
axes may have to be chosen for the orientation of the saddle to the axes to look as 
shown. Replacing f by Af, amounts to supposing that 0 is the critical point and 
that f(0) = 0. Note that if 0 is a saddle point, then there are two complemen- 
tary subspaces, the coordinate axes in the Fig. 3.18, such that 0 is a relative 
maximum for f when f is restricted to one of them, and a relative minimum point 
for the restriction of f to the other. 


Fig. 3.13 


We shall now investigate the general case and find that it is just like the 
two-dimensional case except that when there is a saddle point the subspace on 
which the critical point is a maximum point may have any dimension from 1 
ton — 1 [where d(V) = xn]. Moreover, this dimension is exactly the number of 
—l’s in the standard orthonormal basis representation of the quadratic form 
q(t) = w(t, E) = df E, é). 

hypotheses, then, are that f is a continuously differentiable real-valued 
function on an open subset of a finite-dimensional normed linear space VY, that 
« € A is a critical point for f (df, = 0), and that the mapping d*f,: V = V* 
exists and is nonsingular. This last hypothesis is equivalent to assuming that the 
bilinear form w(£, 7) = d@?f,(, 7) has a nonsingular matrix with respect to any 
basis for V. We now use Theorem 7.1 of Chapter 2 to choose an w-orthonormal 
basis {a,;}7. Remember that this means that w(a;, aj) = Oif¢ + j, w(a;, a) = 1 
fori = 1,..., p, and w(a;, a) = —l1 for? = p+1,...,n. There cannot be 
any 0 values for w(«;, «;) because the matrix i;; = w(a;, a;) is nonsingular: if 
w(a;, ¢;) = 0, then the whole zth column is zero, the column space has dimen- 
sion < n — 1, and the matrix is singular. 

We can use the basis isomorphism ¢ to replace V by R” (ie., replace f by 
f° ¢), and we can therefore suppose that V = R” and that the standard basis is 


190 THE DIFFERENTIAL CALCULUS 3.16 


w-orthonormal, with w(x, y) = LY zw: — D4 2eyi- Since 


2 
(6%, #) = dfala', #) = Dy Dela) = 55 (a), 

our hypothesis of w-orthogonality is that (07f/dx;éx,;)(a) = 0 for ¢ # j, 
af/ax? = 1 for i= 1,...,p, and a*f/ax? = —1 for i= p+1,...,n. 
Since p can have any value from 0 to », there are n + 1 possibilities. We show 
first that if p = n, then a isa relative minimum of f. In this case the quadratic 
form g is said to be positive definite, since g(x) = w(x, x) is positive for every 
nonzero x. We also say that the bilinear form w(x, y) = d?/.(x, y) is positive 
definite, and, in the language of Chapter 5, that w is a scalar product. 


Theorem 16.4. Let f be a continuously differentiable real-valued function 
defined on an open subset A of R”, and let a € A be a critical] point of f at 
which ¢?/ exists and is positive definite. Then f has a relative minimum at a. 


Proof. We suppose, as above, that the standard basis {6°} is w-orthonormal. 
By the definition of d?/,, given ¢, there is a 6 such that 


(aryl) — Afa(x)) — dfx, y)] S ellxll lly 


whenever |{vi] < 8. Now df, = 0, since a is a critical point of f, and df, (x, y) = 


DXi «iz, by the assumption that {8*}4 is w-orthonormal. Therefore, if we use 
the two-norm on R and set y = tx, we have 


ldfasix(x) — ei}xif?] < etifxl|?. 


Also, if kA() = f(a + éx), then h’(s) = afe4ex(x), and this inequality therefore 
says that (1 — etizif? < A’@) < (1+ etx? Integrating, and remembering 
that h(1) — k(O) = f(a-+ x) — f(a) = Af.(x), we have 


) IIx]? < Af) < (: 3 ‘) Hix” 


whenever [lx|| < 6. This shows not only that a is 9 relative minimum point 
but also that Af, lies between two very close paraboloids when x is sufficiently 
small. 9 


The above argument will work just as well in general. If 
PB nr 
g(x) = Doz? — Do x? 
i p+t 


is the quadratic form of the second differential and |}x{3 = +7 2?, then replac- 
ing ||x{|? inside the absolute values in the above inequalities by q(x), we conclude 
that 


100) — ebal? poy) < H+ all? 


3.17 THE TAYLOR FORMULA 191 


or 


nm 


a(S (inf => oz) < Afa(x) 
1 pt+l Fs - 
< 1( (t+es? —- De a— ot). 
1 pti 


This shows that Af, lies between two very close quadratic surfaces of. the 
same type when ||x|| < 6. If 1 < p < n — 1 and a = 0, then f has a relative 
minimum on the subspace V, = L({6*}?) and a relative maximum on the 
complementary space Vz = L({8"}$4). 

According to our remarks at the end of Section 2.7, we can read off the type 
of a critical point for a function of two variables without orthonormalizing by 
looking at the determinant of the matrix of the (assumed nonsingular) form d?f,. 
This determinant is 


fy1499 — eo = a°f a°f — (25)- 


Ox? Ox OX OX 


If it is positive, then a is either a relative minimum or a relative maximum. We 
ean tell which by following f along a single line, say the x;-axis. Thus, if 
a*f/ax? < 0, then a is a relative maximum point. On the other hand, if the 
above expression is negative, then a is a saddle point. 

It is important for the caleulus of variations that Theorem 16.4 remains 
true when the domain space is replaced by a space of the general type that we 
shall study in the next chapter, called a Banach space. The hypotheses now are 
that @ is 4 critical point of f, that g(£) = df.(£, §) is positive definite, and that 
the scalar product norm g'/* (see Chapter 5) is equivalent to the given norm on 
VY. The proof remains virtually unchanged. 


*17. HIGHER ORDER DIFFERENTIALS. THE TAYLOR FORMULA 


We have seen that if V and W are normed linear spaces, and if F is a differ- 
entiable mapping from an open set A in V to W, then its differential dF = dF. 
is a mapping from A to Hom(V, W). If thts mapping is differentiable on A, then 
its differential d(dF) — d(dF),., is a mapping from A to Hom(V, Hom(V, W)). 
We remember that an element of Hom(V, Hom(V, W)) is equivalent by duality 
to a bilinear mapping from V x V to W, and if we designate the space of all such 
bilinear mappings by Hom?(V, W), then d(dF) can be considered te be from A to 
Hom*(V, W). We write d(dF) = d*F, and call this mapping the second differ- 
ential of F. In Seetion 16 we saw that d?F.(é ») = D,(D,F)(a), and that if 
Y = R’, then 


2 = ee 
d°F (b,c) = D,D.F(a) = X dic; 82x; AX; i): 


The differentials of higher order are defined in the same way. If 
a?F: A — Hom*{V¥, W) 


192 THE DIFFERENTIAL CALCULUS 3.17 


is differentiable on A, then its differential, d(d?F) = d°F, is from A to 
Hom(V, Hom?(¥V, W)) = Hom*(V, W), the space of all trilinear mappings from 
V3=VxV™x V toW. Continuing inductively, we arrive at the notion of the xth 
differential of F on A as a mapping from A to Hom{V, Hom™7~'(V, W)} = 
Hom”(V, W), the space of all n-linear mappings from V” to W. The theorem 
that d?F, is a symmetric element of Hom?(V, W) extends inductively to show 
that d"F,, is a symmetric element of Hom"(V, W). We shall omit this proof. 

Our theorem on the evaluation of the second differential by mixed directional 
derivatives also generalizes by induction to give 


Dz, ..+, De Fla) = d*Faléi,.. +5 &n)s 
for starting from the teft-hand term, we have 


Dy (Dg, ++) DPC Ma = A Dee s+ Dg FC))a(§1) 
d(d*—*Fyy(£a, .. «5 En) alEx) 
= AAevcey..g> 2 a Fiyal £1) 
= [ev<g.....8,> ° da Fal E1) 
= OV Ktp,...8.> (2 Fal £1) 
(a"F.o(£1))(€2)-- +, En) = A Fa(€i, «5 &0)- 
If V = R", then our conclusions about partial derivatives extend inductively 
in the same way to show that F has continuous differentials on A up through 


order m if and only if all the mth-order partial derivatives 0"F/@zx;,,..., 8X3, 
exist and are continuous on A, with 


d"F,(c',...,e) = 3 Se (a8 By 
alerabntss ' fr etm ae Okt 


We now consider the behavior of F along the line > a + {, where, of 
course, a and y are fixed. If A(f} = F(a + tn), then we can prove by induction 
that . : : 

@r/d? = (D,)F(« + ty). 


We know this to be true for 7 = 1 by Theorem 7.2, and assuming it for 7 = m, 
we have, by the same theorem, 


det 
Now suppose that Fis real-valued ie R). We then have Taylor’s formula: 


m1 m\ + . 
om = (2) () = d(DFF )asto(2) = Dy(DEF)(a + tn) = DE **F(a + tn). 


A) = (0) + (0) $0 FS Fx) +t it a = AED EL) 
for some k between 0 and 1. Taking ¢ = 1 and substituting from above, we have 


Pla + 9) = Fla) + DyPla) bo + 55 DEP(a) + ea DEF PCa + a), 


3.17 THE TAYLOR FORMULA 193 


which is the general Taylor formula in the normed lincar space context, In 
terms of differentials, it is 


F(a =) = F(a) + dP a(n) +++ + Sa"Fa(n, 49) 


1 
7 (m+)! dad enc bey 7). 


If V = R”, then DG = S71 y; G/dz,;, and so the general term in the 
Taylor expansion is 


AS: Vian zy F(a) = a ae ae ae E (a). 


Byers teed ae Ox; 


If m = » = 2, and if we use the notation x = <x, y>,s = <s,t>, then 


3 DiF la) = s|e$ oF (a) + 2 ZF (a) 4 2 Fol: 
¥ 

The above description is logically simple, but it is inefficient in that it repeats 
identical terms such as y,y2(d?F /Ax, 0x2) and yoy1(d°F /dxq x1). We conclude 
by describing for the interested reader the modern “multi-index” notation for this 
very complicated situation. 

Remember that we are looking at the mth term of the Taylor formula for F, 
and that F has variables. 

For any n-tuplek = <k,,...,k,> of nonnegative integers, we define [k| 
as Dt k,, and for x € R*, we set x* = x,"x,*3.--a,**. Also we set Fy = 
Pry ikg:kar OF better, if DJF = AF /dx;, we set 


DF = DED .-- DF = F,. 


Finally, we set k! = ky!ko!---k,!,and if p > |k|, we set 2) = p!/klip — [k|}1. 
Then the mth term of the Taylor expansion of F is 


> > D*F(a)x*, 
'ni= =m 
which is surely a notational triumph. 

The general Taylor formula is too cumbersome to be of much use in practice; 
it is principally of theoretical value. The Taylor expansions that we actually 
compute are generally found by other means, such as substitution of a poly- 
nomial (or power series) in a power series. For example, 


2 
sin (x + y?) = (x +9?) er) 4 ee ae 
3 2,,,2 5 4 6 
= A see ge ay a 
PU gE g -(F | ra . 


A mapping from A to W which has continuous differentials of all orders 
through k is said to be of class C* on A, and the collection of all such mappings 


194 THE DIFFERENTIAL CALCULUS 3.17 


is designated C*(A, W) or @*(A, W). It is clear that C*(A, W) is a vector space 
(induction). Moreover, it can also be shown by induction that a composition 
of C*-maps is itself of class C*, This depends on recognizing the general form of 
the mth differential of a composition F e G as being a finite sum, each term of 
which isa composition of functions chosen from F, dF,..., @"F, @, dG,... , d™&. 

Functions of many variables are involved in these calculations, and it is 
simplest to treat each as a function of a single z-tuplet variable and to apply the 
obvious corollary of Theorem 8.1 that if G!,..., G* are of class C*, then so is 
G= <G@,...,G">, with dG = <d'@,...,d@">. Asa special case of 
composition, we can conclude that a product of C*-maps is of class C*. 

We shall see in the next chapter that y: T + T7! is a differentiable map on 
the open set of invertible elements in Hom V (if V is a Banach space} and that 
dez(H) = — T-'HT—!. Since <S, H, T> ++ S—!HT—'then has continuous par- 
tial differentials, we can continue, and another induction shows that ¢ is of class 
C* for every & and that d™yr(H, ..., Hm) is a finite sum of finite products of 
T-',Hy,...,Hm. It then follows that a function F defined implicitly by a €*- 
function @ is also of class C*, for its differential, as computed in the implicit- 
function theorem, is then a composition of maps of class C*—!. 

A mapping F which is of class C* for all k is said to be of class €®, and it 
follows from our remarks above that the family of C*-maps is closed under all 
the operations that we have met in the calculus. If the domain of F is an open 
set in R*, then F € ©*(A, W) if and only if all the partial derivatives of F exist 
and are continuous on A. 


CHAPTER 4 


COMPACTNESS AND COMPLETENESS 


In this chapter we shall investigate two properties of subsets of a normed linear 
space V which are concerned with the fact that in a certain sense all the points 
which ought to be there really are there. These notions are largely independent 
of the algebraic structure of V, and we shall therefore study them in their own 
most natural setting, that of metric spaces. The stronger of these two properties, 
compaciness, helps to explain why the theory of finite-dimensional spaces is so 
simple and satisfactory. The weaker property, compleieness, is shared by 
important infinite-dimensional normed linear spaces, and allows us to treat 
these spaces in almost as satisfactory a way. 

It is these properties that save the calculus from being largely a formal 
theory. They allow us to define crucial elements by limiting processes, and are 
responsible, for example, for an infinite series having a sum, a continuous real- 
valued function essuming a maximum value, and a definite integral existing. 
For the real number system itself, the compactness property is equivalent to the 
least upper bound property, which has already been an absolutely essential tool 
in our construction of the differential caleulus in Chapter 3. 

In Sections 8 through 10 we shall apply completeness to the calculus. The 
first of these sections is devoted to the existence and differentiability of functions 
defined by power series, and since we want to include power series in an operator 
T, we shall take the occasion to introduce and exploit the notion of a Banach 
algebra. Next we shall prove the contraction mapping fixed-point theorem, which 
is the missing ingredient in our unfinished proof of the implicit-function theorem 
in Chapter 3 and which will be the basis for the fundamental existence and 
uniqueness theorem for ordinary differential equations in Chapter 6. In Section 
10 we shall prove a simple extension theorem for linear mappings into a complete 
normed linear space and apply it to construct the Riemann integral of a param- 
atrized arc. 


1. METRIC SPACES; OPEN AND CLOSED SETS 


In the preceding chapter we occasionally treated questions of convergence and 

continuity in situations where the domain was an arbitrary subset A of a normed 

linear space VY. In such discussions the algebraic structure of V fades into the 

background, and the vector operations of V are used only to produce the combi- 
195 


196 COMPACTNESS AND COMPLETENESS 4.1 


nation i|« — 6|], which is interpreted as the distance from a to 8. If we distill 
out of these contexts what is essential to the convergence and continuity argu- 
ments, we find that we need a space A and a function p: A X A > R, p(x, y) 
being called the distance from x to y, such that 


I) p(x, y) > Oife # y, and p(z, x) = 0; 
2) p(x, y) = ety, 2) for all x,y € A; 
3) plz, z) < pla, y) + ply, 2) for allz,y,2 € A. 


Any set A together with such a function p from A x A to R is called a metric 
space; the function p is the metric. It is obvious that a normed linear space is a 
metric space under the norm metric p(a, 8) = {la — g|| and that any subset B 
of a metric space A is itself a metric space under p [ B x B. If we start with a 
nice intuitive space, like R” under one of its standard norms, and choose a weird 
subset B, it will be clear that 2 metric space can be a very odd object, and may 
fail to have almost any property one can think of. 

Metric spaces very often arise in practice as subsets of normed linear 
spaces with the norm metric, but they come from other sources too. Even in the 
normed linear space context, metrics other than the norm metric are used. 
For example, S might be a two-dimensional spherical surface in R®, say S = 
{x: 0} 2? = 1}, and p(x, y) might be the great circle distance from x to y. Or, 
more generally, S might be any smooth two-dimensional surface in R’, and 
p(x, y) might be the length of the shortest curve connecting x to y in S. 

In this chapter we shall adopt the metric space context for our arguments 
wherever it is appropriate, so that the student may become familiar with this 
more general but very intuitive notion. We begin by reproducing the basic 
definitions in the language of metrics. Because the sealar-vector dichotomy is 
not a factor in this context, we shall drop our convention that points be repre- 
sented by Greek or boldface roman letters and shall use whatever letters we wish. 


Definition. If X and Y are metric spaces, then f: X — Y is continuous at 
ae X if for every ¢€ there is a 6 such that 


p(x, a) < 5 = p(f(z), fla)) <e 


Here we have used the same symbol ‘p’ for metrics on different spaces, just 
as earlier we made ambiguous use of the norm symbol. 


Definition. The (open) ball of radius + about p, B,(p), is simply the set of 
points whose distance from p is less that +: 


B,(p) = {z : p(x, p) < r}. 


Definition. A subset A CX is open if every point p in A is the center of 
some ball included in A, that is, if 


(Vp € A)(3r?°)(B,(p) C A). 


4.1 METRIC SPACES; OPEN AND CLOSED SETS 197 


Lemma 1,1, Every ball is open; in fact, if g € B,(p) and = r — p(y, q), 
then Bs(g) C B,(p). 


Proof. This amounts to the triangle inequality. For,if z € Bs(g), then p(a, ¢) < 
8 and p(z,p) < p(x, q) +e, Pp) < &+p,9) =", so that « €B,(p). 
Thus Bs(q) C B,(p). O 

Lemma 1.2. If p is held fixed, then p(p, x) is a continuous function of x. 


Proof. A symbol-by-symbol paraphrase of Lemma 3.1 of Chapter 3 shows that 
lo(p, z) — pp, y)| < pla, y), so that p(p, z) is actually a Lipschitz function 
with constant J. 0 
Theorem 1.1. The family 3 of all open subsets of a metric space S has the 
following properties: 
1) The union of any collection of open sets is open; that is, if {A;:7E J} C 
3, then Uier A;€3. 
2) The intersection of two open sets is open; that is, if A, B € 3, then 
ANBes. 
3) @, Ves. 
Proof. These properties follow immediately from the definition. Thus any 


point p in LJ; A; lies in some Aj, and therefore, since A ; is open, some balk about p 
is a subset of A; and hence of the larger set (J; 4; 0 


Corollary. A set is open if and only if it is a union of open balls. 


Preof. This follows from the definition of open set, the lemma above, and 
property (1) of the theorem, U 


The union of ad the open subsets of an arbitrary set, A is an open subset of 
A, by (1), and therefore is the largest open subset of A. It is called the interior 
of A and is designated A™*. Clearly, p is in A’ if and only if some ball about p 
is a subset of A, and it is helpful to visualize A*** as the union of all the balls 
that are in A. 


Definition. A set A is closed if A’ is open. 


The theorem above and De Morgan’s law (Section 0.11) then yield the 
following complementary set of properties for closed sets. 

Theorem 1.2 

1} The intersection of any family of closed sets is closed. 

2) The union of two closed sets is closed. 

3) @ and V are closed. 


Proof. Suppose, for example, that {B;:7€ I} is a family of closed sets. Then 
the complement B; is open for each 7, so that J; Bj is open by the above theorem. 


198 COMPACTNESS AND COMPLETENESS 4.1 


Also, U; B7 = (1); B,)’ by De Morgan’s law (see Section 0.11). Thus (); 3B; is 
the complement of an open set, and is closed. 0 


Continuing our “complementary” development, we define the closure, A, of 
an arbitrary set A as the intersection of all closed sets including A, and we have 
from (1) above that A zs the smallest closed set including A. De Morgan’s law 
implies the important identity 


(A)’ a, {A cake 


For F is a closed superset of A if and only if its complement YU = F’ is an open 

subset of A’, By De Morgan’s law the complement of the intersection of all such 

sets F is the union of all such sets U. That-is, the complement of A is (A’)™*t, 
This identity yields a direct characterization of closure: 


Lemma 1.3. A point p is in A if and only if every ball about p intersects A. 


Proof. A point p is not in A if and only if p is in the interior of A’, that is, if and 
only if some ball about p does not intersect A. Negating the extreme members 
of this equivalence gives the lemma. 0 


Definition. The boundary, 6A, of an arbitrary set A is the difference 
between its closure and its interior. Thus 


dA = A — Aint, 


Since A — B = Af B’, we have the symmetric characterization dA = 
An (A’). Therefore, 2A = 6(A’); also, 


p € OA if and only if every ball about p intersects both A and A’. 


Example. <A ball B,(a) is an open set. In a normed linear space the closure of 
B,{a) is the closed ball about « of radius 7, {¢ : p(€,@) <r}. This is easily seen 
from Lemma 1.3. The boundary 08,({c)} is then the spherical surface of radius 
about a, {&: p(é, a) = 7}. If some but not all of the points of this surface are 
added to the open ball, we obtain a set that is neither open nor closed. The 
student should expect that a random set he may encounter will be neither open 
nor closed. 


Continuous functions furnish an important souree of open and closed sets 
by the following lemma. 


Lemma 1.4. If X and ¥ are metric spaces, and if f is a continuous mapping 
from X to Y, then f—'[A] is open in X whenever A is open in Y. 


Proof. If p €f—'{A], then f(p) € A, and, since A is open, some ball B,(f(p)) 
is a subset of A. But the continuity of f at p says exactly that there is a 6 such 
that f[Bs(p)] c B.{f(p)). In particular, f[Bs(p)] C A and Bs(p) c f~'[A]. 
Thus for each p in f—'[A] there is a ball about p included in f~'[ AJ, and this set 
is therefore open. O 


4.1 METRIC SPACES; OPEN AND CLOSED SETS 199 


Since f—*[A’] = (f7'[A])’, we also have the following corollary. 


Corollary. If f: X — ¥ is continuous, then f—![C] is closed in X whenever C 
is closed in Y. 


The converses of both of these results hold as well. As an example of the use 
of this lemma, consider for a fixed a € X the continuous function f:X ~ R 
defined by f(£) = p(£, a). The sets (—r, 7), (0, 7], and {7} are respectively open, 
closed, and closed subsets of R. Therefore, their inverse images under f—the ball 
B,(a), the closed ball {£ : p(, a) <r}, and the spherical surface {£ : p(, a) = 7} 
—are respectively open, closed, and closed in X. In particular, the triangle 
inequality argument demonstrating directly that B,(a) is open is now seen to be 
unnecessary by virtue of the triangle inequality argument that demonstrates the 
continuity of the distance function (Lemma 1.2). 

It is nef true that continuous functions take closed sets into closed sets in the 
forward direction. For example, if f: R — R is the are tangent function, then 
f[R] = range f = (—7/2, 1/2), which is not a closed subset of R. The reader 
may feel that this example cheats and that we should only expect the f-image of a 
closed set to be a closed subset of the metric space that is the range of f. He 
might then consider f(x) = 22/(1-+ <x”) from R to its range ([—1, 1]. The set 
of positive integers Z* is a closed subset of R, but f[Z+] = {2n/(1 + n®)}f is not 
closed in [—1, 1], since 0 is clearly in its closure. 

The distance between two nonempty sets A and B, p(A, B), is defined as 
glb {p(a,b):a€ A and be B}. If A and B intersect, the distance is zero. 
If A and B are disjoint, the distance may still be zero. For example, the interior 
and exterior of a circle in the plane are disjoint open sets whose distance apart is 
zero. The x-axis and (the graph of) the function f(z) = 1/z are disjoint closed 
sets whose distance apart is zero. As we have remarked earlier, a set A is closed 
if and only if every point not in A is a positive distance from A. More generally, 
for any set A a point p isin A if and only if p(p, A) = 0. 

We list below some simple properties of the distance between subsets of a 
normed linear space. 


1} Distance is unchanged by a translation: p(4A, B) = p(A +7, B+ ¥) 
(becamse ||(« + ¥) — (8 + ¥)|| = |le — &))). 

2) p(kA, kB} = |k\p(A, B) (because ||ka — k8l| = {kj |la — ll). 

3) If V is a subspace, then the distance from B to N is unchanged when we 


translate B through a vector in NV: p(V, B) = p(N, B+) ifa Een 
(because N — y = N). 


4) If T € Hom(V, W), then p(7[A], TIB]) < |TileC4, B) (because 
Tt) — TI < NTH: lle — ail). 


Lemma 1.5. If Nis a proper closed subspace and 0 < €<1, there exists an 
a such that |j«l|] = 1 and p(a, N) > 1 —e. 


200 COMPACTNESS AND COMPLETENESS : 4.1 


Proof. Choose any 8 € N. Then p(s, N) > 0 (because NW is closed), and there 
exists an y € N such that 


8 — all < p(B, ¥)/U. — €) 
Iby the definition of p(8, N)]. Set a = (8 — »)/\/8 — a||. Then jal) = 1 and 


p(a, N) = p(8 — 9, N)/|l8 — all 
= (6, N)/l8 — all > o(8, N)(1 — &/p(8,N) = i —«, 


by (2), (3), and the definition of ». 0 


The reader may feel that we ought to be able to improve this lemma. Surely, 
all we have to do is choose the point in N which is closest to 8, and so obtain 
[8 — nl = p(B, N), giving finally a vector a such that |!e]] = land pla, VW) = 1. 
However, this is a matter on which our intuition lets us down: if N is infinite- 
dimensional, there may not be a closest point 7! For example, as we shall see 
later in the exercises of Chapter 5, if V is the space €({—1, 1]) under the two- 
norm || f|| = ({2, f?)*"?, and if N is the set of functions gin V such that [¢ g = 0, 
then N is a closed subspace for which we cannot find such a “best” a. But if N is 
finite-dimensional, we can always find such a point, and if V is a Hilbert space, 
(see Chapter 5) we can also. 


EXERCISES 


1.1 Write out the proof of Lemma 1.2. 
1.2 Prove (2) and (3) of Theorem 1.1, 
1.3 Prove (2) of Theorem 1.2. 


1.4 Itis not true that the intersection of a sequence of open sets is necessarily open. 
Find a counterexample in R. 


1.5 Prove the corollary of Lemma 1.4. 
1.6 Prove that p € A if and only if p(p, A) = 0. 


1.7 Let X and Y be metric spaces, and let f: X — Y have the property that f-![B} 
is open in X whenever B is open in ¥. Prove that fis continuous. 


1.8 Show that p(z, A) = p(z, A). 


1.9 Show that p{z, A) is a continuous function of x, (In fact, it is Lipschitz con- 
tinuous.) 


1.10 Invent metric spaces S (by choosing subsets of R?) having the following prop- 
erties: 


1) Shas » points. 
2) Sis infinite and p(x, y) > lif x s y. 


3) Shas a ball 8:(a) such that the closed ball {%: p(x, a) < 1} is not the same as 
the closure of Bia). . 


4,2 TOPOLOGY 201 


I.11 Prove that in 4 normed linear space a closed ball is the closure of the corre- 
sponding open ball. 

1.12 Show that if f:X¥ — ¥Y and g: ¥ - Z are continuous {where X, ¥, and Z are 
metric spaces), then so is go f. 

1.13 Let X and Y be metric spaces. Define the notion of a product metric on Z = 
XX Y. Define a i-metric pi and a uniform metric p, on Z (showing that they are 
metrics) in analogy with the l-norm and uniform norm on a product of normed linear 
spaces, and show that each is a product metric according to your definition above. 
1.14 Do the same for a 2-metric pg on Z = XX Y. 

1.15 Let X and ¥ be metric spaces, and let V be s normed linear space, Let f: X —~ R 
and g: ¥ > V be continuous maps. Prove that 


<2, y > f(z) gy) 


is a continuous map from X X F¥ to VP. 


*2, TOPOLOGY 


If X is an arbitrary set and 3 is any family of subsets of X satisfying properties 
(1) through (3) in Theorem 1.1, then 3 is called a topology on X. Theorem 1.1 
thus asserts that the open subsets of a metric space X form a topology on X. 
The subsequent definitions of interior, closed set, and closure were purely 
topological in the sense that they depended only on the topology 3, as were 
Theorem 1.2 and the identity (4)’ = (A’)'"*. The study of the consequences of 
the existence of a topology is called general topology. 

On the other hand, the definitions of balls and continuity given earlier were 
metric definitions, and therefore part of metric space theory. In metric spaces, 
then, we have not only the topology, but also our e-definitions of continuity and 
balls and the spherical characterizations of closure and interior. 

The reader may be surprised to be told now that although continuity and 
convergence were defined metrically, they also have purely topological char- 
acterizations and are therefore topological ideas. This is easy to see if one keeps 
in mind that in a metric space an open set is nothing but a union of balls. We 


have: 


jis continuous at p if and only if for every open set A containing f(p) there 
exists an open set B containing p such that f[B] C A. 


This decal condition involving behavior around 4 single point p is more 
fluently rendered in terms of the notion of neighborhood. A set A is a netghbor- 
hood of a point p if p € A*, Then we have: 


f is continuous at p if and only if for every neighborhood N of f(p), f—1[N] 
is a neighborhood of p. 


Finally there is an elegant topological characterization of global continuity. 
Suppose that S, and Se are topological spaces. Then f: 51 — Se is continuous 


202 COMPACTNESS AND COMPLETENESS 4.3 


(everywhere) if and only if f—'{A] is open whenever A is open. Also, f is eon- 
tinuous if and only if f—![B] is closed whenever B is closed. These conditions 
are not surprising in view of Lemma 1.4. 


3. SEQUENTIAL CONVERGENCE 


Tn addition to shifting to the more general point of view of metric space theory, 
we also want to add to our kit of tools the notion of sequential convergence, 
which the reader will probably remember from his previous encounter with the 
calculus. One of the principal reasons why metric space theory is simpler and 
more intuitive than general topology is that nearly all metric arguments can be 
presented in terms of sequential convergence, and in this chapter we shal! 
partially make up for our previous neglect of this tool by using it constantly and 
in preference to other alternatives. 


Definition. We say that the infinite sequence {z,} converges to the point a 
if for every € there is an N such that 


n> N => p(ta,a) < €. 


We also say that x, approaches @ as x approaches (or tends to) infinity, and 
we call @ the limit of the sequence. In symbols we write z, > a@ as m — 0, or 
Himaaa« tn, = @. Formally, this definition is practically identical with our carlier 
definition of function convergence, and wherc there are parallel theorems the 
arguments that we usc in one situation will generally hold almost verbatim in 
the other. Thus the proof of Lemma 1.1 of Chapter 3 can be alternated slightly 
to give the following result. 


Lemma 3.1. If {&;} and {y;} are two sequences in a normed linear space V, 
then 
&aand7;-8 > &+ 7-7 at B. 


The main difference is that we now choose N as max {N,, No} instead of 
choosing 6 as min {6,, 53}. Similarly: 


Lemma 3.2. If ;-> a in V and z; ~— a in R, then 2;£,; — aa. 


As before, the definition begins with three quantifiers, (Ve})(IN)(¥n). A 
somewhat more idiomatic form can be obtained by rephrasing the definition in 
terms of balls and the notion of “almost all 2”. We say that P(n) is true for 
almest all nif P(x) is true for all but a finite number of integers 7, or equivalently, 
if (QN)(¥n?*)P(n). Then we see that 


lim x, = @ if and only if every ball about @ contains almost all the zp. 


The following sequential characterization provides probably the most 
intuitive way of viewing the notion of closure and closed sets. 


4.3 SEQUENTIAL CONVERGENCE 203 


Theorem 3.1. A point 2 is in the closure A of a set A if and only if there is a 
sequence {z,} in A converging to z. 


Therefore, a set A is closed if and only if every convergent sequence lying 
in A has its limit in A. 


Proof. If {x,} C A and x, — x, then every ball about x contains almost every 
rn, and so, in particular, intersects 4. Thus z € A by Lemma 1.3. Conversely, 
if x © A, then every ball about x intersects A, and we can construct a sequence in 
A that converges to x by choosing x, as any point in By;,(z} MA. Since A is 
closed if and only if A = A, the second statement of the theorem follows from 
the first. U 


There is also a sequential characterization of continuity which helps greatly 
in using the notion of continuity in a flexible way. Let X and Y be metric spaces, 
and let f be any function from X to FY. 7 


Theorem 3.2. The function f is continuous at @ if and only if, for any 
sequence {x,,} in X, if 2, — a, then f(z,) > f(a). 


Proof. Suppose first that f is continuous at @, and let {z,} be any sequence 
converging to a. Then, given any ¢, there is a 3 such that 


pla, a) <b = plf(z), fla)) < «, 
by the continuity of f at a, and for this 6 there is an N such that 
n> N = p(t, , 4) < 4, 


because «, — a. Combining these implications, we sec that given € we have 
found N so that n > N = p(f(e,), f(a)) < €. That is, f(x,) — f(a). 

Now suppose that f is not continuous at @. In considering such a negation 
it is important that implicit universal quantifiers be made explicit. Thus, for- 
mally, we are assuming that ~(Ve)(34)(¥r)(p(a , a) < § => p(f(z), f(a) < ©), 
that is, that (Je)(¥8)(3z)( p@ , a) < 6 & p( f(x) , f(a)) => ©). Such symbolization 
will not be necessary after the reader has had some practice in computing logical 
negations; the experienced thinker will intuit the correet negation without a 
formal calewlation. In any event, we now have a fixed ¢€, and for each 6 of the 
form 6 = 1/n we can let x, be a corresponding x. We then havep(z,, @) < 1/n 
and p( f(x), f(@)) = € for all n. The first inequality shows that 1, — a; the 
second shows that f(z,) -b f(a). Thus, if f is not continuous at a, then the 
sequential condition is sof satisfied. U 


The above type of argument is used very frequently and almost amounts to 
an automatic proof procedure in the relevant situations. We want to prove, say, 
that (Wx) (Jy)(¥z)P(a, y, 2). Arguing by contradiction, we suppose this false, so 
that (4x)(Vy)(Jz)~P(z, y, 2). Then, instead of trying to use all numbers y, we 
Ict y run through some sequence converging to zero, such as {1/7}, and we choose 


204 COMPACTNESS AND COMPLETENESS 43 


one corresponding 2, z,, for each such y. We end up with ~P(z, l/n, z,) for the 
given ¢ and all n, and we finish by arguing sequentially. 

The reader will remember that two norms p and g on a vector space V are 
equivalent if and only if the identity map £+> ¢ is continuous from <V, p> to 
<V,q> and also from <V,g> to <V,p>. By virtue of the above theorem 
we now see that: 


Theorem 3.3. The norms p and gq are equivalent if and only if they yield 
exactly the same collection of convergent sequences. 


Earlier we argued that a norm on a product V x W of two normed linear 
spaces should be equivalent to ||<a, #>|,1 = jlal| + |||]. Now with respect to 
this sum norm it is clear that a sequence <a,, &,> in V X W converges to 
<a, &> if and only if a, ~ ain V and , — in W. We now see (again by 
Theorem 3.2) that: 


Theorem 3.4. A product norm on V X W is any norm with the property 
that <an, é,> —@ <a,é> in V X W if and only if «a, a in V and 
&, 2 tin W. 


EXERCISES 


3.1 Prove that a convergent sequence in a metric space has a unique limit. That is, 
show that if x, - a and zs, — 6, then a = 3B, 

3.2 Show that z, — z in the metric space X if and only if p(z,,z} > 0 in R. 

3.3. Prove that if z, > ain R and x, > 0 for all n, then a > 0. 

3.4 Prove that if 2, > 0 in R and |y,| < z, for all 2, then y, — 0. 

3.5 Give detailed «, N-proofs of Lemmas 3.1 and 3.2. 

3.6 Ky applying Theorem 3.2, prove that if X is a metric space, V is a normed linear 
space, and F and @ are continuous maps from X to Y, then F + @ is continuous. 
State and prove the similar theorem for a product FG. 

3.7 Prove that continuity is preserved under composition by applying Theorem 3.2. 

8.8 Show that (the range of) a sequence of points in a metric space is in general not 
a closed set. Show that it may be a closed set. 


3.9 The fact that in a normed linear space the closure of an open ball includes the 
corresponding closed ball is practically trivial on the pass of Lemma 3.2 and Theorem 
3.1. Show that this is so. 


3.10 Show directly that if the maximum norm || <a, aa = max {|lal|, ||é||} is used 
on ¥V = V¥, X Vo, then it is true that 
<an,§fn> 7 <a,&> in V 
if and only if 
a&—->a in Vy and &— & in Vo. 


4.4 SEQUENTIAL COMPACTNESS 205 


3.11 Show that if |{ || is any inereasing norm on R? (see the remark after Theorem 4.3 
of Chapter 3), then 


p(<21, 41>, X22, ¥2>) = |< elzi, £2), plyi, y2} > ll 
is a metric on the product X < Y of two metrie spaces X and ¥. 
3.12 In the above exercise show that <2,, ¥2> — <2, y> in X X FY if and only if 
Zn — cin X andy, — yin Y. This property would be our minimal requirement for a 
product metric. 
3.13 Defining a preduct metric as above, use Theorem 3.2 to show that 


<f,g>:SoXXY 


is continuous if and only if f: S > X and g: S — Y are both continuous. 

3.14 Let X, Y, and Z be metric spaces, and let f: X X Y — Z be a mapping such 
that f{z, ¥) is continuous in the variables separately. Suppose also that the continuity 
in z is uniform over y. That is, suppose that given ¢ and ro, there is a § such that 


p(z, 20) < 8 => p(f(z,y), flto,y)) < « 


for every value of y. Show that then f is continuous on X X Y. 
8.15 Define the function f on the closed unit square [G, 1] X [0, 1) by 


- ae? ae 
{0,9 =-0, fay= Gry? if <a,y> =~ <0,0>. 


Then f is continuous as a function of x for each fixed value of y, and conversely. Show, 
however, that f is not continuous at the origin. That is, find a sequence <q, ¥,> 
converging to <0,0> in the plane such that f(x», y,) does not converge to 0. This 
example shows that continuity of a function of two variables is a stronger property 
than continuity in each variable separately. 


4, SEQUENTIAL COMPACTNESS 


The reader is probably familiar with the idea of a subsequence. A subsequence of 
a sequence {r,} is a new sequence {y,,} that is formed by selecting an infinite 
number, but generally not all, of the terms x,, and counting them off in the 
order of the selected indices. Thus, if 1; is the first selected », nz the next, and 
80 on, and if we set ¥m = 2,,, then we obtain the subsequence 


{y1, Yo - -+5Ym)-- +} = {us Tangy sss Uns ss } or {Ym} m =, {ta Jae 


Strictly speaking, this counting off of the selected set of indices n is a sequence 
m+ %m from Z* to Z* which preserves order: f'm4.1 > %m for all m. And the 
subsequence m+>x,, is the composition of the sequence n+ x, and the 
selector sequence. 

In order to avoid subscripts on subscripts, we may use the notation n{m) 
instead of r,. In either case we are being conventionally sloppy: we are using 
the same symbol ‘7’ as an integer-valued variable, when we write z,, and as the 
selector function, when we write (om) or tm. This is one of the standard nota- 


206 COMPAGINESS AND COMPLETENESS 4.4 


tional ambiguities which we tolerate in elementary calculus, because the cure is 
considered worse than the disease. We could say: let f be a sequence, ie., a 
function from Z* to R. Then a subsequence of f is a composition f o g, where ¢ 
is a mapping from Zt to Zt such that g(m + 1) > g(m) for all m. 

If you have grasped the idea of subsequence, you should be able to see that 
any infinite sequence of 0’s and I's, say {0, i, 0,0, 1,0,0,0,1,...}, can be 
obtained as a subsequence of {0, 1, 0,1,0,1,...,{1 + (—1)"]/2,...}. 

If x, — a, then it should be clear that every subsequence also converges to a. 
We leave the details as an exercise. On the other hand, if the sequence {z,,} 
does not converge to a, then there is an € such that for every N there is some 
larger n at which p(z,,@) > €. Now we can ehoose such an » for every N, 
taking care that ny41 > ny, and thus choose a subsequence all of whose terms 
are at a distance at least ¢ from ea. Then thts sequence has no subsequence 
converging to a. Thus, if {z,} does not converge to @, then it has a subsequence 
no (sub}subsequence of which converges to ae. Therefore, 


Lemma 4.1. If the sequence {x,} and the point @ are such that every 
subsequence of {x,} has itsclf a subsequence that converges to a, then 
La > a. 


This is a wild and unlikely sounding lemma, but we shall use it to prove a 
most important theorem (Theorem 4.2). 


Definition. A subset A of a metric space is sequentially compact if every 
sequence in A has a subsequence that converges to a point of A. 


Here, so to speak, we create convergence out of nothing. One would expect 
a compact set to have very powerful! properties, and perhaps suspect that there 
aren’t many such sets. We shall soon see, however, that every bounded elosed 
subset of R® is compact, and it is in the theory of finite-dimensional spaces that 
we most frequently use this notion. Sequential compactness in infinite-dimen- 
sional spaces is a much rarer phenomenon, but when it does oceur it is very 
important, as we shal] see in our brief look at Sturm-Liouville theory in Chapter 6. 

We begin with a few simple but important general results. 


Lemma 4,2, If A is a sequentially compact subset of a metric space S, 
then A is closed and bounded. 


Proof. Suppose that {z,} C A and that xr, — 6. By the compactness of A 
there cxista a subsequence {rp}; that converges to a point a € A. But a sub- 
sequence of a convergent sequence converges to the same limit. Therefore, 
a@=bandbe A. Thus A is closed. 

Boundedness here will mean lying in some ball about a given point 5. If A 
is not bounded, for each x there exists a point x, € A such that p(x, b) > n. 
By compactness a subsequence {%,,4}; converges to a point a € A, and 


P(Znci, b) — pla, b). 
This clearly contradicts p{xq (i, 6) > n(t) > i. O 


4.4 SEQUENTIAL COMPACTNESS 207 


Continuous functions carry compact sets into compact sets. The proof of 
the following result is left as an exercise. 


Theorem 4.1. If fis continuous and A isa sequentially compact subset of its 
domain, then f[A] is sequentially compact. 


A nonempty compact set A C # contains maximum and minimum elements, 
This is because lub A is the limit of a sequence in A, and hence belongs to A 
itself, since A is closed. Combining this fact with the above theorem, we obtain 
the following well-known corollary. 


Corollary. If f is a continuous real-valued function and dom (f) is nonempty 
and sequentially compact, then f is bounded and assumes maximum and 
minimum values. 


The following very useful result is related to the above theorem. 


Theorem 4.2. If f is continuous and bijective and dom (f) is sequentially 
compact, then f—! is continuous. 


Proof. We have to show that if y, — y in the range of f, and if r, = f—!(y,) 
and x = f—'(y), then x, > 2. It is sufficient to show that every subsequence 
{ny} ¢ has itself a subsequence converging to x (by Lemma 4.1). But, since 
dom (f} is compact, there is a subsequence {zn;:3))} ; converging to some z, and 
the continuity of f implies that f(z) = lim;s. f(@acijy) = limyse Yaris) = Y- 
Therefore, z = f—'(y) = zx, which is what we had to prove. Thus f~? is con- 
tinuous. 0 


We now take up the problem of showing that bounded closed scts in R” are 
compact. We first prove it for R itself and then give an inductive argument 
for R®. 

A sequence {x} CR is said to be increasing if t, < rn4, for all n. It is 
strictly increasing if 2, < 2,4, for all n. The notions of a decreasing sequence 
and a strictly decreasing sequence are obvious. A sequence which is either increas- 
ing or decreasing is said to be monotone. The relevanec of these notions here lies 
in the following two lemmas. 


Lemma 4.3. A bounded monotone scquence in R is convergent. 


Proof. Suppose that {z,} is increasing and bounded above. Let i be the least 
upper bound of its range. That is, x, < / for all x, but for every e, / — ¢€is not an 
upper bound, and so ? — € < zy for some N. Then 


n>N=>tl—é€<aySa, <5 |, 
and so jz, — i] < ¢. That is, 2, >~lasn— w. O 
Lemma 4.4. Any sequence in R has a monotone subsequence. 


Proof. Call x, a peak term if it is greater than or equal to all later terms. If 
there are infinitely many peak terms, then they obviously form a decreasing 


208 COMPACTNESS AND COMPLETENESS 4.4 


subsequence, On the other hand, if there are only finitely many peak terms, then 
there is a last one 2,, (or none at all), and then every later term is strictly less 
than some other still later term. We choose any 7, greater than zo, and then we 
can choose nz > 2, so thatz,, < x,,, ete. Therefore, in this case we can choose 
a strictly increasing subsequence. We have thus shown that any sequence {z,} 
in R has either a decreasing subsequence or a strictly increasing subsequence. 1] 


Putting these two lemmas together, we have: 
Theorem 4.3. Every bounded sequence in R has a convergent subsequence. 
Now we can generalize to R* by induction. 


Theorem 4.4. Every bounded sequence in R” has a convergent subsequence 
{using any product norm, say || ||). 


Proof. The above theorem is the case n = 1. Suppose then that the theorem is 
true for n — 1, and let {x”}.. be a bounded sequence in R®. Thinking of R” as 
R*—! x R, we have x” = <y™, z_>,and fy}, is bounded in R"—!, because if 
x= <y,z>, then |[xil; = [lyl]1 + lz] > |ly]:. Therefore, there is a subsequence 
{y*?}, converging to some y in R"—!, by the inductive hypothesis. Since 
{nq} 1s bounded in R, it has a subsequence {zn;4p))} p converging to some x in 
R. Of course, the corresponding subsubsequence {y"“””} , still converges to y 
in R*—!, and then {x*°),, converges to x = <y,z> in R® = R*—' x R, 
since its two component sequences now converge to y and z, respectively. We 
have thus found a convergent subsequence of {x"}. O 


Theorem 4.5. If A is a bounded closed subset of R", then A is sequentially 
compact (in any product norm). 


Proof. If {x,} C A, then there is a subsequence {r,,;}; converging to some x 
in R”, by Theorem 4.4, and x is in A, since A is closed. Thus A is compact. 0 


We can now fill in one of the minor gaps in the last chapter. 
Theorem 4.6. All norms on R® are equivalent. 


Proof. It is sufficient to prove that an arbitrary norm || || is equivalent to || {{1. 
Setting a = max {|]5*|}7, we have 


n “ n : 
IIx = Is z.6'| = > Ixal [[3"l] S allxl|s, 


so one of our inequalities is trivial. We also have | |{x|| — |lyl] | < lz — yll < 
a\[z — y|li, so |]z|| is a continuous function on R* with respect to the one-norm. 
Now the unit one-sphere S = {z: |[x/|, = 1} is closed and bounded and so 
compact (in the one-norm). The restriction of the continuous function |x| to 
this compact set S has a minimum value m, and m cannot be zero because S 
does not contain the zero vector. We thus have |{z|} > mllz||, on S, and so 
[xi] => mllz|[, on R*, by homogeneity. Altogether we have found positive 
constants a and m such that mij] |l1 < || || < el] [ia 0 


4.4 SEQUENTIAL COMPACTNESS 209 


Composing with a coordinate isomorphism, we see that all norms on any 
jinite-dimensional vector space are equivalent. 


Corollary. If M is a finite-dimensional subspace of the normed linear space 
¥, then © is a closed subspace of V. 


Proof. Suppose that {&,} CM and &, ~a€ V. We have to show that a is in 
M. Now {&,} is a bounded subset of M, and its closure in Af is therefore se- 
quentially compact, by the theorem. Therefore, some subsequence converges to 
a point 8 in M as well as toa, andsoaw= se MM. 0 


EXERCISES 


4,1 Prove by induction that if f:Z*+ — Zt is such that f(r + 1) > f(m) for all n, 
then f(z) > x for all n. 

4.2 Prove carefully that if zt, 9a as n— ©, then acm) — a@ as m— © for any 
subsequence. The above exercise is useful in this proof. 

4.3 Prove that if {z,} is an increasing sequence in R (1241 > &, for all 7), and if 
{rz} has a convergent subsequence, then {x,} converges. 

4.4 Give a more detailed version of the argument that if the sequence {z,} does not 
converge to a, then there is an € and a subsequence {i,¢m)}m Such that p(Xp(m), @) > € 
for all m. 

4.5 Find a sequence in R having no convergent subsequence. 

4.6 Find a nonconvergent sequence in R such that the set of limit points of con- 
yergent subsequence consists exactly of the number 1. 

4,7 Show that there is a sequence {z,} in (0, 1] such that for any y € [0, 1] there isa 
subsequence £,,, converging to y. 

4.8 Show that the set of limits of convergent subsequences of a sequence {rn} ina 
metric space X is a closed subset of X, 

4.9 Prove Theorem 4.1. 

4.10 Prove that the Cartesian product of two sequentially compact metric spaces is 
sequentially compact. (The proof is essentially in the text.) 


4.21 A metric space is boundedly compact if every closed bounded set is sequentially 
compact. Prove that the Cartesian product of two boundedly compact metric spaces is 
boundedly compact (using, say, the maximum metric on the product space). 


4.12 Prove that the sum 4-+ B of two scquentially compact subsets of a normed 
linear space is sequentially compact. 


4.13 Prove that the sum A + B of a closed set and a compact set is closed. 

4.14 Show by an example in R that the sum of two closed sets need not be closed. 
4,15 Let {C,} be a decreasing sequence (C41 CC, for all x) of nonempty closed 
subsets of a sequentially compact metric space S. Prove that (\{C. is nonempty. 
4,16 Give an example of a decreasing sequence {C,} of nonempty closed subsets of 
a metrie space such that f)? C, = 


210 COMPACTNESS AND COMPLETENESS 4.5 


4.17 Suppose the metric space S has the property that every decreasing sequence 
{C,,} of nonempty elosed subsets of S has nonempty intersection. Prove that then S 
must be sequentially compact. (Hint: Given any sequence {z,} C S, let C, be the 
closure of {z;:% > n}.} 

4.18 Let .{ be a sequentially compact subset of a nls Y, and let B be obtained from .1 
by drawing all line segments from points of -4 to the origin (that is, 


B= fla:a€ A andté€ [0, 1]}). 


Prove that B is compact. 
4.19 Show by applying a compactness argument to Lemma 1.5 that if N is a proper 
closed subspace of a finite-dimensional vector space VY, then there exists @ in V such 
that lal] = pla, NV) = 1. 


5. COMPACTNESS AND UNIFORMITY 


The word ‘uniform’ is frequently used as a qualifying adjective in mathematics. 
Roughly speaking, it concerns a “point” property P(y) which may or may not 
hold at each point y in a domain A and whose definition involves an existential 
quantifier. A typical form for P{y) is (¥e)(3d)Q(y, ¢, @). Thus, if Pty) is ‘f is 
continuous at y’, then P{y) has the form (Ve)(35)Q(y, €, 5). The property holds 
on A if it holds for all y in A, that is, if 


(vy=A)[(ve) (3d) Q(y, ¢, @)]- 


Here d will, in general, depend both on y and c; if either y or ¢ is changed, the 
corresponding ¢ may have to be changed. Thus 4 in the definition of continuity 
depends both on € and on the point y at which continuity is being asserted. The 
property is said to hold uniformly on A, or uniformly tn y, if a value d can be 
found that is independent of y (but still dependent onc), Thus the property holds 


uniformly in y if 
(¥e){32d) (Vy=A)Q(y, ¢, d); 


the uniformity of the property is expressed in the reversal of the order of the 
quantifiers (¥y=4) and (3d). Thus f is uniformly continuous on A if 


(We)(36) (Vy, 2“A)Iply, 2) < 6 => p(f(y), f@)) < €. 


Now 64 is independent of the point at which continuity is being asserted, but still 
dependent on ¢, of course. 

We saw in Section 14 of the last chapter how much more powerful the point 
condition of continuity becomes when it holds uniformly. In the remainder of 
this section we shall discuss some other uniform notions, and shall see that the 
uniform property is often implied by the point property if the domain over which 
it holds is sequentially compact. 

The formal statement forms we have examined above show clearly the 
distinction between uniformity and nonuniformity. However, in writing an 
argument, we would generally follow our more idiomatic practice of dropping out 


4.5 COMPACTNESS AND UNIFORMITY 211 


the inside universal quantifier. l’or example, a sequence of functions {f,) c W4 
converges pointwise to f: A — W if it converges to f at every point p in A, that 
is, if for every point p in A and for every ¢ there is an N such that 


n>N = p(falp),f(p)) < € 


The sequence converges uniformly on A if an N exists that is independent of p, 
that is, if for every € there is an N such that 


n>N = p(falp), f(p)) S€ — forevery pin A. 


When p(é, 9) = ||& — all, saying that p(f.(p), f(p)) < € for all p is the same as 
saying that ||f, — fll. < ¢. Thus f, — f uniformly if and only if ||f, — f\|. — 0; 
this is why the norm |}/||.. is called the uniform norm. 

Pointwise convergence does not imply uniform convergence. Thus f,{x) = 
xz” on A = (0, 1) converges pointwise to the zero function but does not converge 
uniformly. 

Nor does continuity on A imply uniform continuity. The function f(z} = 
1/x is continuous on (0,1) but is not uniformly continuous. The function 
sin (1/x) is continuous and bounded on (0, 1) but is not. uniformly continuous. 
Compactness changes the latter situation, however. 


Theorem 5.1. If fis continuous on A and A is compact, then f is uniformly 
continuous on A. 


Proof. This is one of our “automatic” negation proofs. Uniform continuity 
(UC) is the property 


(We?) (38°) (Vz, y=A)lp(e, y) < 8 = p(f(a), fY)) < €). 


Therefore, ~UC = (Je)(¥4)(a2, y)[e(z, y} < 3 and p(f(z), fly)) > d. Take 
6 = 1/n, with corresponding x, and y,. Thus, for all », p{rn, yn} < 1/n and 
plf(zn), f(yx)) > €, where ¢€ is a fixed positive number. Now {x,} has a con- 
vergent subsequence, say ta.) — #, by the compactness of A. Since 


PUYacey nay) < 1/2, 
we also have yn <i, ~ 2. By the continuity of f at <, 
e(f(eacy)s fYac)) s e(flen), A) =e eC T(z), SYacw)) — 0, 


which contradicts p(f(rac),/@nw)) > € This completes the proof by nega- 
tion. O 


The compactness of A does not, however, automatically convert the point- 
wise convergence of a sequence of functions on A into uniform convergence. The 
“piecewise linear” functions f,: [0,1] — [0, 1] defined by the graph shown in 
Fig. 4.1 converge pointwise to zero on the compact domain [0, 1}, but the con- 
vergence is not uniform. (However, see Exercise 5.4.) 


212 COMPACTNESS AND COMPLETENESS 4.5 


1 1 


Vn 3/n 


Fig. 4.1 Fig, 4.2 


We pointed out earlier that the distance between a pair of disjoint closed 
sets may be zero. However, if one of the closed sets is compact, then the distance 
must be positive. 


Theorem 5.2, If A and C are disjoint nonempty closed sets, one of which is 
compact, then p{A, C) > 0. 


Proof. The proof is by automatic contradiction, and is left to the reader. 


This result is again a uniformity condition. Saying that a set A is disjoint 
from a closed set C is saying that (¥x4)(3r7°) (B(x) NC = @). Saying that 
p(A,C) > 0 is saying that (Jr7®)(va*4) ... 

As a last consequence of sequential compactness, we shall establish a very 
powerful property which is taken as the definition of compactness in general 
topology. First, however, we need some preparatory work. If A is a subset of a 
metric space S, the r-neighberhood of A, B,{ A}, is simply the union of all the balls 
of radius r about points of A: 


BA] = U{B-(a): ae A} = fx: (2a%4)(p(z, a) < r)}. 


A subset A CS is x-dense in § if S C B,[A], that is, if each point of S is closer 
than r to some point of A. 

A subset A of a metric space S is dense in Sif A = 8. This is the same as 
saying that for every point p in S there are points of A arbitrarily close to p. 
The set Q of all rational numbers is a dense subset of the real number system R, 
because any irrational real number zx can be arbitrarily closely approximated by 
rational numbers. Since we do arithmetic in decimal notation, it is customary to 
use decimal approximations, and if 0 < x < 1 and the decimal expansion of 
xis a= >}a,/10", where each a, is an integer and 0 < a, < 10, then 
7 a,,/10" is a rational number differing from x by less than 107”. Note that A 
is a dense subset of B if and only if A is 7-dense in B for every positive r. 

A set B is said to be fotally bounded if for every positive r there is a finite set 
which is 7-dense in B. Thus for every positive r the set B can be covered by a 
finite number of balls of radius r. For example, the » — 1 numbers {7/n}77‘ are 
({1/n)-dense in the open interval (0, 1) for each n, and so (0, 1) is totally bounded. 


4.5 COMPACTNESS AND UNIFORMITY 213 


Total boundedness is a much stronger property than boundedness, as the 
following lemma shows. 


Lemma 5.1. If the normed linear space V is infinite-dimensional, then its 
closed unit ball By = {£: ||é|] < 1} cannot be covered by a finite number 
of balls of radius 4. 


Proof. Since V is not finite-dimensional, we can choose a sequence {a,} such 
that «,+1 is not in the linear span M,, of {a,..., an}, for each x. Since Af, is 
closed in Y, by the corollary of Theorem 4.6, we can apply Lemma 1.5 to find 
a vector £, in M, such that |/f,]] = 1 and p(é, Mai) > % for all n > 1. 
We take £ = « /Jla,||, and we have a sequence {¢,} C B, such that 


lim — Enll > 3 


ifm ~ n. Then no ball of radius 4 can contain more than one £), proving the 
lemma. 0 


For a concrete example, let V be €([0, 1), and let f, be the “peak” function 
sketched in Fig. 4.2, where the three points on the base are 1/(2n + 2), 1/(2n +1), 
and 1/2n. Then f,4, is “disjoint” from f,, (that is, fazifn = 0), and we have 
lfnile = 1 for alin and If, — fille = 1 if * m. Thus no ball of radius 4 can 
contain more than one of the functions f,, and accordingly the closed unit ball in 
Y cannot be covered by a finite number of balls of radius 4. 


Lemma 5.2. Every sequentially compact set A is totally bounded, 


Proof. If A is not totally bounded, then there exists an r such that no finite 
subset F is r-dense in A. We can then define a sequence {p,} inductively by 
taking p; as any point of A, pe as any point of A not in B,(p,), and pz as any 
point of A not in BUT"! pj = UT! B-(p,). Then {pp} is a sequence in A 
such that p(p;, pj) = r for allt ~ 7. But this sequence can have no convergent 
subsequence. Thus, if A is not totally bounded, then 4 is not sequentially com- 
pact, proving the lemma, U 


Corollary. A normed linear space V is finite-dimensional if and only if its 
closed unit ball is sequentially compact. 


Proof. This follows from Theorem 4.4 in one direction and from the above two 
lemmas in the other direction. 0 


Lemma 5.3, Suppose that A is sequentially compact and that {F,:7 € I} 
is an open covering of A (that is, {F;} isa family of open sets and A c [J; B,). 
Then there exists an r > 0 with the property that for every point p in A the 
ball B,(p) is included in some £;. 


Proof. Otherwise, for every r there is a pomt p in A such that B,(p) is not a sub- 
set of any Z;. Taker = 1/n, with corresponding sequence {p,}. Thus Byja{pa) 
is not a subset of any Z;. Since A is sequentially compact, {p,} has a convergent 
subsequence, Prom) > p a8 m+ oo. Since {#,;} covers A, some EH, contains p, 


214 COMPACTNESS AND COMPLETENESS 4.5 


and then B,(p) C #; for some e > 0, since #; is open. Taking m large enough so 
that I/m < €/2 and also p(paum, p) < €/2, we have 


Bijngny(Prim) G B Xp) Cc Ej, 


contradicting the fact that Byj,(p,) is not a subset of any £,; The lemma has 
thus been proved. 0 


Theorem 5.3. If F is an open covering of a sequentially compact set A, 
then some finite subfamily of & covers A. 


Proof. By the lemma immediately above there exists an r > 0 such that for 
every pin A the ball B,(p) lies entirely in some set of $, and by the first Jemma 
there exist 1,..., Pn in A such that A C Ut B-(p,). Taking corresponding sets 
EB, in ¥ such that 8,(p,) CB, for i = 1,..., 2, we clearly have A Cc Ui #,. 0 


In general topology, a set A such that every open covering of A includes a 
finite covering is said to be compact or to have the Heine-Borel property. The 
above theorem says that in a metric space every sequentially compact set is 
compact. We shall see below that the reverse implication also holds, so that the 
tivo notions are in fact equivalent on a metric space. 


Theorem 5.4. If A is a compact metric space, then A is sequentially 
compact. 


Proof. Let {x,} be any sequence in A, and let $ be the collection of open balls B 
such that B contains only finitely many x;. If ¥ were to cover A, then by com- 
pactness A would be the union of finitely many balls in S, and this would clearly 
imply that the whole of A contains only finitely many x;, contradicting the fact 
that {z;} is an infinite sequence. Therefore, $ does not cover A, and so there is a 
point z in A such that every ball about x contains infinitely many of the z;. 
More precisely, every ball about x contains x; for infinitely many indices 7. Tt can 
now be safely left to the reader to see that a subsequence of {2,,} converges to x. B 


EXERCISES 


5.1 Show that f,(z) = x" does not converge uniformly on (0, 1}. 

5.2 Show that f(z) = 1/zx is nut uniformly continuous on (0, 1). 

3.3 Define the notion of a function A:.Y¥ X ¥ — ¥ being uniformly Lipschitz in its 
second variable over its first variable. 

3.4 Let § be a sequentially compact metric space, and let {f,} be a sequence of 
continuous real-valued functions on S that decreases pointwise to zero (that is, {f,(p)} 
is a decreasing sequence in R and f,(p) + 0asn-> © for each pin S). Prove that the 
convergence is uniform. (Try to apply Exercise 4.15.) 

5.5 Restate the corollaries of Theorems 15.1 and 15.2 of Chapter 3, employmg the 
weaker hypotheses that suffice by virtue of Theorem 5.1 of the present section. 


46 EQUICONTINUITY 215 


5.6 Prove Theorem 5.2. 

5.7 Prove that if .{ is an r-dense subset of a set Y in a normed linear space ¥’, and 
if B is an s-dense subset of aset ¥ C VY, then A + Bis (r+ s)-dense in X + FY. Con- 
clude that the sum of two totally bounded subsets of V is totally bounded. 

5.8 Suppose that the n points {p.)7 are r-dense in a metric space X. Let A be any 
subset of X. Show that A has a subset of at most » points that is 2r-dense in .1. 
Conehuide that any subset of a totally bounded metric space is itself totally bounded. 

5.9 Prove that the Cartesian product of two totally bounded metric spaces is totally 
bounded. 

5.10 Show that if a metric space X has a dense subset A that is totally bounded, then 
X is totally bounded, 

5,11 Show that if two continuous mappings f and g from a metric space X to a metric 
space Y are equal on a dense subsct of X, then they are equal everywhere. 

5.12 Write out in explicit quantified form involving the existence of balls the state- 
ment that the interiors of the sets {4,} cover the metric space .1. Then show that the 
conclusion of Lemma 5.3 is another uniformity assertion. 

5.13 Reprove the theorem that a continuous function on a compact domain is 
bounded on the basis of Theorem 5.3, 

5.14 Reprove the theorem that a continuous function on a compact domain is 
uniformly continuous from Theorem 5.3. 


6. EQUICONTINUITY 


The application of sequential compactness that we shall make in an infinite- 
dimensional context revolves around the notion of an eguécontinuous family of 
functions. If A and B are metric spaces, then a subset § C B4 is said to be 
equicontinuous at py in A if all the functions of ¥ arc continuous at po and if 
given ¢, there is a 6 which works for them all, i-e., such that 


P(P, Bo) < 6 => plf(p), fpo}}) < € forevery fin. 


The family ¢ is undformly equicontinuous if 6 is also independent of po, and so is 
dependent only on ¢. Our quantifier string is thus (¥¢)(38)(¥p, ¢©4)(¥f*"). 

For example, given m > 0, let § be a collection of functions f from (0, 1) to 
(0, 1) such that f’ exists and |f’| < mon (0, 1). Then |f(z) — f(y)}| < mlx — yl, 
by the ordinary mean-value theorem, Therefore, given any ¢, we can take 
5 = €/m and have 


le -—y <8 = fe) —fy <e 


for ali x, y E (0, 1) and all fes. The collection § is thus uniformly equicon- 
tinuous. 


Theorem 6.1. If A and B are totally bounded metric spaces, and if ¥ is a 
uniformly equicontinuous subfamily of B“, then F is totally bounded in the 
uniform metric. 


216 COMPACTNESS AND COMPLETENESS 47 


Proof. Givene > 0, choose és0 that for all fin ¥ and all 1, poin A, p(pi, po) < 
8 = p(f(p1), f(p2)) < €/4. Let D bea finite subset of A which is 8-dense in A, 
and let E be a finite subset of B which is (€/4)-dense in B. Let G be the set EZ? of 
all functions on D into E. G is of course finite; in fact, #@ = n™, where m = #D 
and x = #E, Finally, for each g & G let 5, be the set of all functions f € ¥ such 
that 

p(f(p), g{p)) <€/4 — forevery pe D. 


We claim that the colleetions F, cover $ and that each ¥, has diameter at most ¢. 
We will then obtain a finite e-dense subset of $ by choosing one function from 
each nonempty $,, and the theorem will be proved. 

To show that every f € $ is in some *,, we simply construct a suitable g. 
For each p in D there exists a ¢ in F whose distance from f(p) is less than €/4. 
If we choose one such q in # for each p in D, we have a function g in G such that 
f EF. 

The final thing we have to show is that if f, h € %,, then p(f, kh) < «. Since 
pth, 9) < €/4 on D and pf, g) < €/4 on D, it follows that 


plf(p), A(p)) < €/2 = forevery pe D. 


Then for any p’ € A we have only to choose p € D such that p(p’, p) < §, and 
we have 
p(S(p’), hp’) S eUS(—), Hp) + pUi(p), h(p)) + p(Alp), h(p’)) 
<64+6¢2+e/46€ 0 
The above proof is a good example of a mathematical argument that is 


completely elementary but hard. When referring to mathematical reasoning, the 
words ‘sophisticated’ and ‘difficult’ are by no means equivalent. 


7. COMPLETENESS 


If x, + a as n— oo, then the terms x, obviously get close to each other as 
gets large. On the other hand, if {,} is a sequence whose terms get arbitrarily 
close to each other as x — w, then {z,} clearly ought to converge to a limit. 
It may not, however; the desired limit point may be missing from the space. 
If a metric space S is such that every sequence which ought to converge actually 
docs converge, then we say that S is complete. We now make this notion precise. 


Definition. {z,} is a Cauchy sequence if for every € there is an N such that 
m> Nandan > N => p(tm,2%,) <€. 
Lemma 7.1, If {z,} is convergent, then {z,} is Cauchy. 


Proof. Given €, we choose N such that n > N = p(x_, a) < €/2, where @ is 
the limit of the sequence. Then if m and n are both greater than N, we have 


P(tm ta) < plzm, ®) + pla, ze) < €/2+¢/2=—6. 0 


4.7 COMPLETENESS 217 


Lemma 7.2. If {x,} is Cauchy, and if a subsequence is convergent, then 
{zn} itself converges. 


Proof. Suppose that ta.3, > aast—> «©. Given e, we take ¥ a0 that m,n > N= 
P(tn, Tm} <€. Because xa(3) + @ as ¢— «©, we can choose an 7 such that 
nt) > N and p(aaqy, a) < €. Thusifm > N, we have 


pPlim) a) < P(Tm; Lacy) + Plena a) < 2¢, 
and so tm — a. O 


Actually, of course, if m,n > N = p(atm,2n) < €, and if x, — a, then for 
any m > N it is true that p(tm, a) < €. Why? 


Lemma 7.3. If A and B are metric spaces, and if 7 is a Lipschitz mapping 
from A to B, then T' carries Cauchy sequences in A into Cauchy sequences in 
B. This is true in particular if A and B are normed linear spaces and T' is an 
element of Hom(A, B). 


Proof. Let {x,} be a Cauchy sequence in A, and set y, = T(x,). Given ¢, 
choose NV so that m,n > N = p{tm,2n) < €/C, when C is a Lipschitz constant 
for F. Then 


m n> N = pum, Yn} ae p(T (2m); T(xn)) = Coltms tn) <Ce/C =e. U 
This lemma has a substantial generalization, as follows. 


Theorem 7.1. If A and B are metric spaces, {z,} is Cauchy in A, and 
F: A — B is uniformly continuous, then {F(x,)} is Cauchy in B. 


Proof. The proof is left as an exercise. 


The student should try to acquire a good intuitive feel for the truth of these 
lemmas, after which the technica] proofs become more or less obvious. 


Definition. A metric space A is complete if every Cauchy scquence in A 
converges to a limit in A. A complete normed linear space is called a Banach 
space. 


We are now going to list some important examples of Banach spaces. In 
each case a proof is necessary, so the list becomes a collection of theorems. 


Theorem 7.2, R is complete. 


Proof. Let {x,} be Cauchy in R. Then {z,} is bounded (why?) and so, by 
Theorem 4.3, has a convergent subscquenee. Lemma 7.2 then implies that 
{x,} is convergent. 0 


Theorem 7.3. If A is a complete metric space, and if f is a continuous 
bijective mapping from A to a metric space B such that f—} is Lipschitz 
continuous, then B is complete. In particular, if V is a Banach space, and if 
T in Hom(V, W) is invertible, then W is a Banach space. 


218 COMPACTNESS AND COMPLETENESS 47 


Proof. Suppose that {y,} is a Cauchy sequence in B, and set 2; = f—'(y,) for 
allz. Then {x,} is Cauchy in A, by Lemma 7.3, and so converges to some x in A, 
since A is complete. But then y, = f(#,) — f(x), because f is continuous. 
Thus every Cauchy sequence in # is convergent and B is complete. 0 


The Banach space assertion is a special case, because the invertibility of 7 
means that 7~+ exists in Hom(W, V) and hence is a Lipschitz mapping. 


Corollary. If p and ¢ are equivalent norms on V and <V, p> is complete, 
then so is <V,g>. 


Theorem 7.4. If V; and V2 are Banach spaces, then so is V; X Vo. 


Proof. If {<£n,2,>} is Cauchy, then so are each of {&} and {np} (by 
Lemma 7.3, since the projections 7; are bounded). Then t, > a and y,_,— 8 
for some «aE Vy; and BE Vy. Thus <£t,,47,> - <a, 6> in Vy X Vo. (See 
Theorem 3.4.) U 


Corollary 1. If {V,)7 are Banach spaces, then so is T[#., V;. 


Corollary 2, Every finite-dimensional vector space is a Banach space 
(in any norm). 


Proof. R” is complete (in the one-norm, say) by Theorem 7.2 and Corollary 1 
above. We then impose a one-norm on V by choosing a basis, and apply the 
corollary of Theorem 7.3 to pass to any other norm. ff] 


Theorem 7,5. Let W be a Banach space, let A be any set, and let ®(A, W) 
be the vector space of all bounded functions from A to W with the uniform 
norm |jfij. = lub fijf(@)l]:4€ A. Then @(A, W) is a Banach space. 


Proof. Let {f,} be Cauchy, and choose any a € A. Since {/f,(a) — fm(a)|! < 
lfn -- Fml|e, it follows that {f,{a)} is Cauchy in W and so convergent. Define 
g: A — W by g(a) = limf,(a) for each a€ A. We have to show that g is 
bounded and that f, — g. 

Given €, we choose V so that m,n > N => {ifm — Salle <«. Then 


i fm{a) — g{a)|} = lim iLim(@) — fa(a)|| < €. 


Thus, ifm > N, then {ifn(a) — g(a)l| < €foralla © A, and hence |\fm — gle < 


¢. This implies both that f,, — ¢ € ®(A, W), and so 
¥ = Im — (fm — 9) € BA, W), 
and that fn — g in the uniform norm. 0 


Theorem 7.6. If V is a normed linear space and W is a Banach space, then 
Hom(V, W’) is a Banach space. 


The method of proof is identical to that of the preceding theorem, and we 
leave it as an exercise. Boundedness here has a different meaning, but it is used 


47 COMPLETENESS 219 


in essentially the same way. One additional fact has to be established, namely, 
that the limit map (corresponding to g in the above theorem) is linear. 


Theorem 7.7. A closed subset of a complete metric space is complete. A 
complete subset of any metric space is closed. 


Proof. The proof is left to the reader. 


It follows from Theorem 7.7 that a complete metric space A is absolutely 
closed, in the sense that no matter how we extend A to a larger metric space 
B, A is always a closed subset of B. Actually, this property is equivalent to 
completeness, for if A is not complete, then a very important construction of 
metric space theory shows that A ean be completed. That is, we can construct 
a complete metric space B which includes A. Now, if A is not complete, then 
the closure of A in B, being complete, is different from A, and A is not absolutely 
closed. 

See Exercise 7.21 through 7.23 for a construction of the completion of a 
metric space. The completion of a normed linear space is of course a Banach 
space. 


Theorem 7.8. In the context of Theorem 7.5, let A be a metric space, let 
@(A, W) be the space of continuous functions from A to W, and set 


BC(A, W) = BA, W) NECA, W). 
Then &€ is a closed subspace of @. 


Fig. 4.3 


Proof. We suppose that {f,} C @e and that |lf, — gl. 70, where g E&. 
We have to show that g is continuous. This is an application of a much used 
“up, over, and down” argument, which can be schematically indicated as in 
Fig. 4.3. 

Given ¢, we first. choose any n such that |[f, — gil < €/3. Consider now 
anya & A. Since f, is continuous at a, there exists a 6 such that 


plz, a) <t = itfn(x) — fr{a)|| < €/3. 


220 COMPACTNESS AND COMPLETENESS 47 


Then 


p(x, a) < 8 = |lg(~) — g(a)|| < lle@) — fale)|| + IF) — FA C@il 
+ Ilfn(a) — g(a)il < €/3 + €/3 + €/3 = €. 


Thus g is continuous at a for every a € A, and sog € we. LU 


This important classical result is traditionally stated as follows: The limit of 
a uniformly convergent sequence of continuous functions is continuous. 


Remark. The proof was slightly more genera]. We actually showed that if 
Sn ~» f uniformly, and if each f, is continuous at a, then f is continuous at a. 


Corollary. @C(A, W) is a Banach space. 


Theorem 7.9. If A is a sequentially compact metric space, then A is com- 
plete. 


Proof. A Cauchy sequence in A has a subsequence converging to a limit in A, 
and therefore, by Lemma, 7.2, itself converges to that limit. Thus A is complete. 0 


In Section 5 we proved that a compact set is also totally bounded. It can be 
shown, conversely, that a complete, totally bounded set A is sequentially com- 
pact, so that these two properties together are equivalent to compactness. 

The crucial fact is that if A is totally bounded, then every sequence in A 
has a Cauchy subsequence. If A is also complete, this Cauchy subsequence will 
converge to a point of A. Thus the fact that total boundedness and complete- 
ness together are equivalent. to compactness follows directly from the next 
lemma. 


Lemma 7.4. If A is totally bounded, then every sequence in A has a Cauchy 
subsequence. 


Proof. Let {pm} be any sequence in A. Since A can be covered by a finite 
number of balls of radius 1, at least one ball in such a covering contains infinitely 
many of the points {p,,}. More precisely, there exists an infinite set 47, Cc Z* 
such that the set {p., :m € M,} lies in a single ball of radius 1. Suppose that 
M,,...,M, C Z* have been defined so that 34, C M; fori =1,..., — 1, 
M,, is infinite, and {p,,:m € Af,} isa subset of a ball of radius 1/2 for? = 1,...,n. 
Since 4 can be covered by a finite family of balls of radius 1/{n + 1), at least 
one covering ball contains infinitely many points of the set {p, : me M,}. More 
precisely, there exists an infinite set 17,4; C M, such that {pn: me M,41} 
is a subset of a ball of radius 1/(2 + 1). We thus define an infinite sequence 
{M,,} of subsets of Z+ having the above properties. 

Now choose m, € My, mz € Mz so that mg > mj, and, in general, m,41 € 
Ma 1 80 that m,,1 > m,. Then the subsequence {pm,}n is Cauchy. For 
given €, we can choose 1 so that I/n < €/2. Theni,j > n=3m,m;EM,=> 
O(Pm;s Pm,) < 2(1/n) <€. This proves the lemma, and our theorem is a 
corollary. G 


4.7 COMPLETENESS 224 


Theorem 7.10. A metric space S$ is sequentially compact if and only if S$ is 
totally bounded and complete. 


The next three sections will be devoted to applications of completeness to 
the calculus, but before embarking on these vital matters we should say a few 
words about infinite series. As in the ordinary calculus, if {¢,} is a sequence in a 
normed linear space V, we say that the series } £; converges and has the sum a, 
and write >? & = a, if the seguence of partial sums converges to a. This means 
thato, + aas%— , whereg, is the finite sum >>} §; foreach x. We say that 
X § converges absolutely if the series of norms > || é,j] converges in R. This is 
abuse of language unless it is true that every absolutely convergent series con- 
verges, and the importance of the notion stems from the following theorem. 


Theorem 7.11, If V is a Banach space, then every absolutely convergent 
series in V is convergent. 


Proof. Let + §; be absolutely convergent. This means that 5° || £,|| converges in 
R, i.e., that the sequence {s,,} converges in R, where s, = 3°73 [fll If m < x, 
then 


Won i 
los — onl = [D> | <3 Mell = 60 ~ om 

ifs 1 m+ti 
Since {s;} is Cauchy in R, this inequality shows that {o,} is Cauchy in V and 
therefore, because V is complete, that {¢,} is convergent in V. That is, © & is 
convergent in V, 0 


The reader will be asked to show in an exercise that, conversely, if a normed 
linear space V is such that every absolutely convergent series converges, then V 
is complete. This property therefore characterizes Banach spaces. 

We shall make frequent use of the above theorem. For the moment we 
note just one corollary, the classical Weierstrass comparison test. 


Corollary. If {f,} is a sequence of bounded real-valued (or W-valued, for 
some Banach space W) functions on a common domain A, and if there is a 
sequence {/f,,} of positive constants such that }° Mf, is convergent and 
\lfnlle < 4, for each », then  f, is uniformly convergent. 


Proof. The hypotheses imply that +"'/f,,||. converges, and so }> f, converges in 
the Banach space ®(A, W) by the theorem. But convergence in @(A, W) is 
uniform convergence. [ 


EXERCISES 


7.1 Prove that a Cauchy sequence in a metric space is a bounded set. 
7.2 Let V be a normed linear space. Prove that the sum of two Cauchy sequences 
in V is Cauchy. 


7.3 Show also that if {£,} is Cauchy in V and {a,} is Cauchy in R, then {an£,} is 
Cauchy ia V. 


222 COMPACTNESS AND COMPLETENESS 4.7 


7.4 Prove that if {&,} is a Cauchy sequence in a normed linear space VY, then 
(WE, []} is a Cauchy sequence in R. 

7.5 Prove that if {2,; and [y,; are two Cauchy sequences in a metric space 8, 
then {p(%2,¥n)) is a Cauchy sequence in R. 

7.6 Prove the statement made after the proof of Lemma 7,2. 

7.7 The rational number system is an incomplete metric space. Proye this by 
exhibiting a Cauchy sequence of rational uumbers that does not converge to a rational 
number. 


4.8 Prove Theorem 7.1. 

7.9 Decuce a strengthened form of Theorem 7.8 from Theorem 7.1. 
7.40 Write out a careful proof of Theorem 7.6, modeled on the proof of Theorem 7.5. 
7.11 Prove Theorem 7,7. 
7.12 Let the metric space X have a dense subset Y such that every Cauchy sequence 
in Y is convergent in X. Prove that X is comptete. 
7.13 Show that the set W of all Cauchy sequences in a normed linear space V is 


itself a vector space and that a seminorm p can be defined on W by p({&.}) = lim }[£nll. 
(Put this together from the material in the text and the preceding problems.) 


7.14 Continuing the above exercise, for each £ € Y, let £* be the constant sequence 
all of whose terms are & Show that @: E+ £ is an isometric linear injection of V into lV 
and that @(V] is dense in W in terms of the seminorm from the above exercise. 


7.15 Prove next that every Cauchy sequence in @[V] is convergent in W. Put Ixer- 
cises 4.18 of Chapter 3 and 7.12 through 7.14 of this chapter together to conclude that 
if NV zs the set of null Cauchy sequences in W, then W/N is a Banach space, and that 
+> & is an isometric linear injection from V to a dense subspace of W/N. This con- 
stitutes the standard completion of the normed linear space V. 

7.16 We shall now sketch a nonstandard way of forming the completion of a metric 
space S. Choose some point pa in 8, and let V be the set of real-valued functions on S 
such that fipa) = 0 and fis a Lipschitz function. For fin V’ define || fl] as the smallest 
Lipschitz constant for f. Phat is, 


\|fl| = lub {1f(p) — fal /ptp, at. 
pe 


Prove that T’ is a normed linear space under this norm. (¥ actually is complete, but 
we do not need this fact.) 


7.17 Continuing the above exercise, we know that the dual space V* of all bounded 
linear functionals on V is complete by Theorem 7.6. We now want to show that S can 
be isometrically imbedded in ¥*; then the closure of § as a subset of V will be the 
desired completion of S. For each p & S, let @,: V — R be “evaluation at p”. That is, 
6,(f) = fp). Show that 9, € V* and that |{6, — 9,]] < p(p, q). 


7.18 In order to conclude that the mapping @: p+> 8, is an isometry (i.c., is distance- 
preserving}, we have to prove the opposite inequality ||@, — @,l| > p(p,qg). To do this, 
choose p and consider the special function f(z) = p(p, 2) — plp, po). Show that f is 
in V and that |[fj| = 1 (from an early lemma in the chapter). Now apply the definition 
of |i@, — @,]| and conelude that @ is an isometric injection of S into V*. Then 8] is 
our constructed completion. 


48 A FIRST LOOK AT BANACH ALGEBRAS 223 


7.19 Prove that if a normed Jinear space V has the property that every absolutely 
convergent series converges, then V is complete. {Let {a,} be a Cauchy sequence. 
Show that there is a subsequence {a,,} ; such that if £; = an,,, — on, then |[£,|| < 274 
Conclude that the subseqnence converges and finish up.) 

7.20 The above exercise gives a very useful criterion for V to be complete. Use it to 
prove that if V is a Banach space and N is a closed subspace, then V/N is a Banach 
space (see Exercise 4.14 of Chayter 3 for the norm on Y/N). 


7.21 Prove that the sum of a uniformly convergent series of infinitesimals (all on the 
same domain) is an infinitesimal. 


8 A FIRST LOOK AT BANACH ALGEBRAS 


When we were considering the implicit-function theorem and the inverse-function 
theorem in the last chapter, we saw how useful it is to know that if a transfor- 
mation 7 has an inverse 7'—', then so does S whenever ||S — 7'|| is small enough, 
and that the mapping 7’ 7—' is continuous on the open set of all invertible 
elements. When the spaces in question are finite-dimensional, these facts can 
be made to follow from the continuity of the determinant function 7’ > A(T) 
from Hom V to R. It is also possible to produce them by arguing directly in 
terms of upper and lower bounds for T and its close approximations S. But the 
most natural, most elegant, and—in the case of Banach spaces—easiest way to 
prove these things is to show that if V is a Banach space and T in Hom ¥V has 
norm less than one, then the sum of the geometric series 3-9 7” is the inverse of 
fi — T, just as in the elementary calculus. But in making this argument, the 
fact that 7 is a linear transformation has little importance, and we shall digress 
for a moment to explore this situation. 

L&t us summarize the norm and algebraic properties of Hom V when V isa 
Banach space. T'irst of all, we know that Hem Y is also a Banach space. Second, 
it is an algebra. That is, it possesses an associative multiplication operation 
(composition) that relates to the linear operations according to the following 
laws: 


S(T, + T2) = ST, + ST, 
(S; -+ S2)P = 8\T + Sof, 
c(ST) = (cS)T = S(cT). 


Finally, multiplication is related to the norm by 
STI < |S] 7 and |Z] = 1. 


This list of properties constitutes exactly the axioms for a Banach algebra. 

Just as we can see certain properties of functions most clearly by forgetting 
that they are functions and considering them only as elements of a vector space, 
now it turns out that we can treat certain properties of transformations in 
Hom(V) most simply by forgetting the complicated nature of a linear transfor- 
mation and considering it merely as an element of an abstract Banach algebra A. 


224 COMPACTFNESS AND COMPLETENESS 48 


The most important simple thing we can do in a Banach algebra that we 
couldn’t do in 4, Banach space is to consider power series. The following theorem 
shows that the geometric series, in particular, plays the same central role here 
that it plays in elementary calculus. Since we are not thinking of the elements of 
A as transformations, we shall designate them by lower-case letters; ¢ is the 
identity of A. 


Theorem 8.1. If A isa Banach algebra, and if z in A has norm less than one, 
then {e¢ — z) is invertible and tts inverse ts the sum of the geometric series 
in x: 


(e—x) = are 
0 
Also, le — e — xz)7!|| < r/(1 — r), where r = |[z/]. 


Proof. Since ||z*|i < |lz|[* = r*, the series > x” is absolutely convergent when 
\|x|| < 1 by comparison with the ordinary geometric series }> +". It is therefore 
convergent, and if y = 3"§ x”, then 


id . 
(e — x#)y = lim (e — 2) x’ = lime — 2" *') = 2, 
Naw 0 
since flxlj"*! < r?+! <0. That is, y = (e — 2)—?. Finally, 


2" me head ~r)j.0 


Theorem 8.2. The set % of invertible elements in a Banach algebra A is 
open and the mapping 2 +> 2! is continuous from 9% to 9. In fact, if y7! 
exists and m = ||y7||, then (y — h)—' exists whenever |iki| < 1/m and 
Wy — AYT! — yA" S m7 qh||//1 — mfihli). 


Proof. Set x= y~th. Then (y — hk) = yfe — x), where |lz|! = [ty All 
m|lAjl, and so by the above theorem y — h will be invertible, with (y — h)7! 
{e — x)~'y~!, provided {{k|] < 1/m. Then also 


lly~* — Y@— ATT S lle — (@ — ZI em, 


lle — (e —2)—*] = 


IA 


and this is bounded above by 
mrf(l —r) < m7 rj\/ — mlrlf), 
by the last inequality in the above theorem. 0 


Corollary. If V and W are Banach spaces, then the mvertible elements in 
Hom({V, W) form an open set, and the map f+ T~! is continuous on this 
domain. 


Proof. Suppose that T—! exists, and set m = |}7—"||. Then if ||7” — S|} < 1/m, 
we have |/f — 7—'S|| < {7—"1] ||P — S|] < 1, andso T'S = J — (J — 718) 
is an invertible element of Hom V. Therefore, S = T(7'—18) is invertible and 
St= (P US) 'f"!, The continuity of SH» S~? is left to the reader. O 


4.8 A FIRST LOOK AT BANACH ALGEBRAS 225 


We saw above that the map x > (e — x)~! from the open unit ball B; (0) in 
a Banach algebra A to A is the sum of the geometric power series. We can define 
many other mappings by convergent power series, at hardly any greater effort. 


Theorem 8.3. Let A bea Banach algebra. Let the sequence {an} C A and 
the positive number 6 be such that the sequence {||a,||6"} is bounded. Then 
¥ a,z" converges for x in the ball B;(0) in A, and if 0 < s < 4, then the 
series converges uniformly on B,(0). 


Proof. Set r = s/6, and let 6 be a bound for the sequence {|[a,||6"}. On the 
ball B,(0) we then have ‘la,z"|! < llan|/s* = lla, /[6"r" < dr", and the series 
therefore converges uniformly on this ball by comparison with the geometric 
series b}> r*, sincer < 1. U 


The series of most interest to us will have real coefficients. They are included 
in the above argument because the product of the vector z and the scalar ¢ is 
the algebra product (te)x. In addition to dealing with the above geometric series, 
we shall be particularly interested in the exponential function e7 = SO? z7/nt. 
The usual comparison arguments of the elementary calculus show just as easily 
here that this series converges for every x in A and uniformly on any ball. 

It is natural to consider the differentiability of the maps from A to A defined 
by such convergent series, and we state the basic facts below, starting with a 
fundamental theorem on. the differentiability of a limit of a sequence. 


Theorem 8.4, Let {¥"} be a sequence of maps from a ball B in a normed 
linear space V to a normed linear space W such that *" converges pointwise 
toa map F on B and such that {d¥>} converges for each a and uniformly over 
«. Then F is differentiable on B and dfs = lim dF for each # in B. 


Proof. Fix 8 and set 7 = limdF§. By the uniform convergence of {dF*}, 
given ¢, there is an N such that |dF® — dF*|| < ¢ for all n > N and for all « 
in B. It then follows from the mean-value theorem for differentials that 


I|(AFB(E) — AFS(E)) — (aF3(e) — dF 5 (E))|| < 2el'el| 


for all 2 > N and all & such that 8+ EB. Letting n > o and regrouping, 
we have 


f|(AFa(#) — TE) — (AFs' (8) — aF9(€))|| < 2ell&l 
for all such £ But, by the definition of dF there is a 3 such that 
|AFS(e) — aFs (all < fel 
when ||¢|| < 6. Putting these last two inequalities together, we see that 
eli < 8 = |SPs() — T(EIl < 3ellgll. 
Thus F is differentiable at 8 and dFs = T. 0 


The remaining proofs are left as a set of exercises. 


226 COMPACTNESS AND COMPLETENESS 4.8 


Lemma 8.1. Multiplication on a Banach algebra A is differentiable (from 
AXA to A). If we let p be the product funetion, so that pla, y) = xy, 
then dpcoa>(a, y) = ay + xb. 


Lemma 8.2, Let A be a commutative Banach algebra, and let » be the 
monomial function z(r) = ax?. Then p is everywhere differentiable and 


n—t 


dp,(c) = nay" "a, 


Lemma 8.3. If {|la,\j*"} is a bounded sequence in R, then {nl|a,||s”} is 
bounded for any 0 < s < r, and therefore > na,z"—) converges uniformly 
on any ball in A smaller than B,(0). 


Theorem 8.3. If A is a commutative Banach algebra and {a,} C A is such 
that {lla,|l/} is bounded in R, then F(x) = § @,«” is defined and differ- 
entiable on the bali B,(O) in A, and 


aF (2) = © nani") “2. 
T 


It is natural to call the element Lf na,y"—? the derivative of F at y and to 
designate it F’(y), although this departs from our rule that derivatives are 
vectors obtained as limits of difference quotients. The remarkable aspect of the 
above theorem is that for this kind of differentiable mapping from an open 
subset of A to A the linear transformation d?, is multiplication by an element 
of A: dF (x) = Fay) +x. 

In particular, the exponential function exp (4) = e7 = Yop x”/n! is its own 
derivative, since D3 nx” —'/n! = 1G 2”"/m!, and from this fact (see the exer- 
cises} or from direct algebraic manipulation of the series in question, we can 
deduce the law of exponents e?*” = e*e”. Remember, though, that this is on a 
commutative Banach algebra. The function «++ e? = 5 x"/n! can be defined 
just as easily on any Banach algebra A, but it is not nearly as pleasant when A 
is noncommutative. However, onc thing that we can always do, and often 
thereby save the day, is to restriet the exponential mapping to a commutative 
subalgebra of A, say that generated by a single element x. For example, we can 
consider the parametrized are Y{{) = e’* (x fixed) into any Banach algebra A, 
aud, because its range lies in the commutative subalgebra X generated by +, we 
ean apply Theorem 7.2 of Chapter 3 to conclude that Y is differentiable and that 


vt) = d expe (x) = xe". 
This can also easily be proved directly from the law of exponents: 
AY,(h) = et tO* — oft = oft(gk® — 4), 
and sinee it is clear from the series that (e** — 1)/k — x as h + 0, we have that 


VO = lim SU os acl, 
hoo Oo 


4.8 A FIRST LOOK AT BANACH ALGEBRAS 227 
EXERCISES 


8.1 Finish the proof of the corollary of Theorem 8.2. 

8.2 Let .1 be a Banach algebra, and let fa,} CR and x € .{ be such that ¥ a,2! 
converges. Suppose also that x satisfies a polynomial indentity p(x) = }6 6.c* = 0, 
where {6.1 C R and b, + 0, Prove that the element >-? @;2* is a polynomial in x of 
degree <n —1. (Let Uf be the linear span of {x9 §7', and show first that 2‘ © 
for all ¢.) 

$.3 Let .1 be any Banach algebra, let z be a fixed element in .1, and let V be the 
smallest closed subalgebra of .1 containing x. Prove that X is a commutative Banach 
algebra. (The set of polynomials p(x} = 576 a,x‘ is the smallest algebra containing 2. 
Consider its closure in .X.} 

8.4 Prove Lemma 8.1. [Hint: <2, y> > zy is a bounded bilinear map.) 

8.5 Prove Lemma 8.2 by making a direct 4-estimate from the binomial expansion, 
as in the elementary caleulus. 

3.6 Prove Lemma 8.2 by induction from Lemma 8.1. 

8.7 Let .1 be any Banach algebra. Prove that p:2++ 2° is differentiable and that 
dpg(z) = xa? + axa+t a?z. 

8.8 Prove by induction that if q(z) = x%, then gq is differentiable and 


a—l 
dqa(z) = > aira’s-1-2, 


=0 
Deduce Lemma 8.2 as a corollary. 


8.9 Let .t beany Banach algebra. Prove thatr: 2+» 2-1 is everywhere differentiable 
on the wpen set U of invertible elements and that 


dr.(z) = — aa'zxa7!. 


[Hint: Examine the proofs of Theorems 8.1 and 8.2,] 


$.10 Let A be an open subset of a normed linear space Y, and let # and G be mappings 
from .1 to a Banach algebra X that are differentiable at a. Prove that the product 
mapping FG is differentiable at a and that d{(FG,} = Fla) dG, + dF, Gia). Does it 
follow that d(F2), = 2¥(w) dF? 

$.11 Continuing the above exercise, show that if X is a commutative Banach algebra, 
then d(F"), = aF*—!(a) dFy. 

$.12 Let /: A — X be a differentiable map from an open set -1 of a normed lincar 
space to a Banach algebra X, and suppose that the element F(£) is invertible in VY 
for every fin A. Prove that the map G: > [F(£)]—! is differentiable and that 
dG,(#) = — F(a)—! d¥,(£)F (a). Show also that if F isa parametrized are(.1=7 CR), 
then G’(a) = — F(a)! - F(a) + Fla) 7, 

8.13 Prove Lemma 8.3. 

8.14 Prove Theorem 8.5 by showing that Lemma 8.3 makes Theorem 8.4 applicable. 


8.15 Show that in Theorem 8.4 the convergence of F" to F needs only to be assumed 
at one point, provided we know that the codomain space W is a Banach space. 


228 COMPACTNESS AND COMPLETENESS 4.9 


8.16 We want to prove the law of exponents for the exponentia! function on a com- 
mutative Banach algebra. Show first that (exp (—2x))(exp 2) = e by applying Exercise 
7.13 of Chapter 3, the above Exercise 8.10, and the faci that d exp, (x) = (exp @)ax. 
8.17 Show that if Y is a eammutative Banach algebra and #: X — X is a differ- 
entiable maj such that. dF,(&) = ff'(a@), then #(£) = Bexp & for some constant f. 
[Consider the differential of F(£) exp (—&)-] 

8.18 Now set F(f) = exp (£+ 9) and prove from the above exercise that 


exp (E+ 9) = exp (&) exp (9). 


You will also need the fact that exp 0 = 1. 

8.19 J.et z be a nilpotent element in a commutative Banach algebra VY. That is, 
z? = 0 for some positive integer p. Show by an elementary estimate based on the 
binomial expansion that if |lz|| <1, then |Jz-+ 2l|* < An*jlz"7? for xn > p. The 
series of positive terms >, n*r” converges for r < 1 (by the ratio test). Show, therefore, 
that the series for log (1 — (2 + 2)) and for (1 — (z + z))—' converge when ||z|| < 1. 
8.20 Continuing the above exercise, show that F(y) = log (1 — y) is defined and 
differentiable on the ball {jy — zl] < 1 and that dFa(x) = —(1 — a)-!-2. Show, 
therefore, that exp (log (1 — y)) = 1 yon this ball, either by applying the inverse 
mapping theorem or by applying the composite function rule for differentiating. 
Conclude that for every nilpotent element 2 in YX there exists a uw in X such that 
expu=1—z 

$.21 Let X¥,,...,X, be Banach algebras. Show that the product Banach space 
X = []i X; becomes a Banach algebra if the product xy = <z,...,2n><Y1,.6-,Un> 
is defined as <x1y1,...,¢ny¥. > and if the maximum norm is used on X. 

8.22 In the above situation the projections 7, have now become bounded algebra 
homomorphisms. In fact, just as in our original vector definitions on a product space, 
our definition of multiplication on X was determined by the requirement that z;(xy) = 
wi(x)a.{y) for all 7. State and prove an algebra theorem analogous to Theorem 3.4 of 
Chapter 1. 

8.23 Continuing the above discussion, suppose that the series 5 a,x” converges in X, 
with sum y. Show that then }-(a,).(z,)" converges in X,; to y; for each t, where, of 
course, y = <y1,...,y,>. Conclude that e* = ~et,...,e%> for any x = 
<f1,...,%,> in X, 

8.24 Define the sine and cosine functions on a commutative Banach algebra, and 
show that sin’ = cos, cos’ = —sin, sin? + cos? = e. 


9. THE CONTRACTION MAPPING FIXED-POINT THEOREM 


Tn this section we shall prove the very simple and elegant fixed-point theorem for 
contraction mappings, and then shail use it to complete the proof of the implicit- 
function theorem. Later, in Chapter 6, it will be the basis of our proof of the 
fundamental existence and uniqueness theorem for ordinary differential equa- 
tions. The section concludes with a comparison of the iterative procedure of the 
fixed-point theorem and that of Newton’s method. 


4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 229 


A mapping K from a metric space X to itself is a contraction if it isa Lipschitz 
mapping with constant less than £; that is, if there isa constant C with0 <C <1 
such that p{K(x), K(y)} < Cpe(z, y) for all 2, ye X. A fixed point of K is, of 
course, @ point x such that K(x) = zx. 

A contraction K can have at most one fixed point, since if K(x) = x and 
K(y) = y, then p(z, y) = p{K(z), K(y)) < Cp(z, y), and so (1 — C)p(x, y) < 0. 
Since C < I, this implies that p(z, y} = O and x = y. 


Theorem 9.1. Let X be a nonempty complete metric space, and let 
K:X — X beacontraction. Then K has a (unique) fixed point. 


Proof. Choose any xo in X, and define the sequence {x,}§ inductively by setting 
a, = K(xo), 2 = K(x1) = K*(xq), and 2, = K(t,_1) = K*(ao). Set 6 = 
p(X1, Zo). Then p(r2, x1) = e(K(21), K(xo)) S Cp(xy, x9) = Cé, and, by indue- 
tion, 

PEna1; In} = pe(K(z,), K(zn_1)) < Cpltp,fn-1) SC Ce 8= CPs. 


It follows that {x,} is Cauchy, for if # > x, then 


m—1 m—{ 2 
P(tm, tn) < DY pliant, t) < DO C% < 078/11 — ©), 
Tm n 


and C” — 0 as — oo, because C < 1. Since X is complete, {z,} converges to 
some @ in X, and it then follows that K(a) = lim K@,) = lim 2241 = a4, so 
that a is a fixed point. 0 


In practice, we meet mappings K that are contractions only near some 
particular point p, and we have to establish that a suitable neighborhood of p 
is carried into itself by K. We show below that if K is a contraction on a bal! 
about p, and if K doesn’t move the center of p very far, then the theorem can 
be applied. 


Corollary 1. Let D be a closed ball in a complete metric space X, and let 
K:D—- 4X be a contraction which moves the center of D a distance at 
mest (1 — C)r, where r is the radius of D and C is the contraction constant. 
Then K has a unique fixed point and it is in D. 


Proof. We simply check that the range of K is actually in D. If p is the center 
of D and x is any point in D, then 


p(K(z), p) < p{K(2), K(p)) + p(K(p), p) 
< Co(z,p} + —C}r Sh Cr+ —-—Cyr=r 0 


Corollary 2. Let B be an open ball in a complete metrie space X, and let 
K:B— X bea contraction which moves the center of B a distance less than 
(1 — C)r, where r is the radius of B and C is the contraction constant. 
Then K has a unique fixed point. 


Proof. Restrict K to any slightly smaller closed ball D concentric with B, and 
apply the above corollary. C 


230 COMPACTNESS AND COMPLETENESS 4.9 


Corollary 3. Let K be a contraction on the complete metric space X, and 
suppose that K moves the point x a distance d. Then the distance from z to 
the fixed point is at most d/(1 — C), where C is the contraction constant. 


Proof. Let D be the closed ball about x of radius » = ¢/(1 —~ C), and apply 
Corollary 1 to the restriction of K to D. It implies that the fixed point is in D. 0 


We now suppose that the contraction K contains a parameter s, so that K 
jis now a function of two variables K(s, x). We shall assume that K is a con- 
traction in x uniformly over s, which means that p(K(s, x), K(s, y)) < Cp(z, y) 
for all x, y, and s, where 0 < C < 1. We shall also assume that K is a con- 
tinuous function of s for each fixed x. 


Corollary 4. Let K be a mapping from S x X to X, where X is a complete 
metrie space and S is any metric space, and suppose that K(s, 2) is a con- 
traction in 2 uniformly over s and is continuous in s for each x. Then the 
fixed point p, is a continuous function of s. 


Proof. Given €, we use the continuity of K in its first variable around the point 
<t,p:> to choose 4, so that if p(s, 2) < 4, then the distance from K(s, p,) to 
K(t, 7) is at most €. Since K(é, p,) = p,, this simply says that the contraction 
with parameter value s moves 7; a distance at most €, and so the distance from 
p, to the fixed point p, is at most €/(1 — C) by Corollary 3. That is, p(s, 2) < 
6 => p(p.,p.) < €/(1 — C), where C is the uniform contraction constant, and 
the mapping s +> p, is accordingly continuous at t. 0 


Combining Corollaries 2 and 4, we have the following theorem. 


Theorem 9.2. Let B be a ball in a complete metric space X, let S be any 
metric space, and let K bea mapping from S  B to X which is a contraction 
in its second variable uniformly over its first variable and is continuous in its 
first variable for each value of its second variable. Suppose also that K 
moves the center of B a distance less than (1 — C)r for every sin S, where r 
is the radius of B and C is the uniform contraction constant. Then for each s 
in S there is a unique p in B such that K(s,p) = p, and the mapping 
3 =: is continuous from S$ to B. 


We can now complete the proof of the implicit-function theorem. 


Theorem 9.3. Let V, W, and X be Banach spaces, let A x B be an open 
subset of V x W, and let G: A xX B > X be continuous and have a con- 
tinuous second partial differential. Suppose that the point <a, @> in 
A xX B is such that Gia, 8} = 0 and dG2..g> is invertible. Then there are 
open balls Af and N about « and 8, respectively, such that for each £ in Af 
there is a unique 7 in N satisfying G(t, 7) = 0. The function F thus 
uniqucly defined near <«, 8> by the condition G(£, F(£)) = 0 is continuous. 


Proof. Set T = dG2..g5 and K(£, 7) = » — T7'(G(é, y)}. Then XK ts a con- 
tinuous mapping from A x B to W such that K(a, 8) = @, and K has a con- 


4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 231 


tinuous second partial differential such that dK2.a,55 = 0. Because dK2,,,> is 
a continuous function of <p, ¥>, we ean choose a produet ball Mf x N about 
<a, @> on which dK%,.,5 is bounded by 4, and we can then decrease the ball M 
if necessary so that for u in Af we also have ||K(u, 8) — Blf < 7/2, where r is the 
radius of the ball ¥. The mean-vaiue theorem for differentials implies that K is 
2 contraction in its second variable with constant 4. The preceding theorem 
therefore shows that for each ¢ in M there is a unique y in NW such that K(£, 9) = 
» and the mapping F: +> » is continuous. Since K(#, ») = y if and only if 
G(é, 7) = 0, we are done. 0 


Theorems 8.2 and 9.3 complete the list of ingredients of the implicit-function 
theorem. (However, see Exercise 9.8.) 

We next show, in the other direction, that if a contraction depending on a 
parameter is continuously differentiable, then the fixed pomt is a continuously 
differentiable function of the parameter. 


Theorem 9.4. Let V and W be Banach spaces, and let K be a differentiable 
mapping from an open subset A X B of V X W to W which satisfies the 
hypotheses of Theorem 9.2. Then the function F from A to B uniquely 
defined by the equation K(£, F(£)) = F(é) is differentiable. 


Proof. The inequality ||K(€, 9’) — K(& 9){l < C|ln’ — ni] is equivalent to 
|@K2.,a>|| < C for all <a,8> in A XB. We now define @ by G(E, 2) = 
n — K(é,n), and observe that dG? = J — dK? and that dG? is therefore 
invertible by Theorem 8.1. Since G(é, F(¢)) = 0, it follows from Theorem 11.1 
of Chapter 3 that F is differentiable and that its differential is obtained by 
differentiating the above equation. 


Corollary. If K is continuously differentiable, then so is F. 


* We should emphasize that the fixed-point theorem not only has the implicit- 
function theorem as a consequence, but the proof of the fixed-point theorem 
gives an iterative procedure for actually finding the value of F(£), onee we 
know how to compute T—' (where T = dG%,.g5). In fact, for a given value of 
£ in a small enough ball about <a, 8> consider the function G(,-}). If we 
set K(é, 9) = » — T7'G(s, 9), then the inductive procedure 


m1 = KE, 04) 
becomes 
(niga — mm) = —T'G(E, nA). (9.1) 


The meaning of this iterative procedure is easily seen by studying the graph of 
the situation where V = W = R!. (See Fig. 4.4.) As was proved above, under 
suitable hypotheses, the series }°linv41 — 1,|| converges geometrically. 

It is instructive to compare this procedure with Newton’s method of elemen- 
tary calculus. There the iterative scheme (9.1) is replaced by 


(ai42 = 9) = ST*G(E, my), (9.2) 


232 COMPACFNESS AND COMPLETENESS 4.9 


where S; = IG e.9,>* (See Fig. 4.5.) As we shall see, this procedure (when it 
works) converges much more rapidly than (9.1), but it suffers from the dis- 
advantage that we must be able to compuie the inverses of an infinite number of 
linear transformations S;. 


Fig. 4.5 


Let us suppress the & which will be fixed in the argument and consider a map 
G defined in some neighborhood of the origin in a Banach space. Suppose that ¢ 
has two continuous differentials. For definiteness, we assume that G is defined 
in the unit ball, B, and we suppose that for each x € B the map dG, is invertible 
and, in fact, 
l@@r"| < K, — |id’G,|| < K. 


Let to = 0 and, assuming that x, has been defined, we set 
fn41 FS Ty — Sn 'G(zn), 


where S, = dG@,,. We shall show that if ||((0)|| is sufficiently small (in terms 
of K), then the procedure is well defined (that is, |[z,41\| < 1) and converges 
rapidly. In facet, if r is any real number between one and two (for instance 
+ = #), we shall show that for some c (which can be made large if {|@(0)|| is 
small) 

ltn — teal] < e7™". (+) 


4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 233 


Note that if we can establish (*) for large enough ¢, then |r,| < 1 follows. 
In fact, 


2 —ert cl —cr® —en(r—1)} eeuny 
Pilebe" seers te =f 


which is < Lif is large. Let us try to prove (*) by induction. Assuming it true 
for n, we have 
[‘te+1 — tall = [Sz 'C,)|| 
< K\|@(en_1 — Se11E(en—a)) || 
< K{||@(en_1) i= dG2,_Sa-1 ny) || + K|lzn = ¥a—1 {7} 
by Taylor’s theorem. Now the first term on the right of the inequality vanishes, 


and we have 
enya — all < K? zn _ %a—1|l" x Ki ee, 


For the induction to work we must have 


K2¢72¢* < enemtt 
or 
K? < eter re (*x) 


Since r < 2, this last inequality ean be arranged by choosing ¢ sufficiently 
large. We must still verify (+) form = 1. This says that 


\|So*G(O)|| < e~” 


or 
eos 
‘ ————e 
'@(0)|| S K (x«%) 
In summary, for 1 < + < 2 choose ¢ so that K? < e-”* and 
ett 


ee 

Then if (***) holds, the sequence t, converges exponentially, that is, (*) holds. 
If «<= limz;, then G(x) = lim G(z,) = lim Sa(txzz1 — x2) = 0. This is 
Newton’s method. 

As a possible choice of ¢ and 7, let 7 = §, and let ¢ be given by K? = ¢®*/4, 
so that (++) just holds. We may also assume that K > 2°/4, so that e/4 > 49/4 
or e° > 4, which guarantees that e~°/? < 4, implying that e~°/?/(1 — e—*/*) < 
1. Then («***) becomes the requirement G(0) < K~°. 

We end this section with an example of the fixed-point iterative procedure in 
its simplest context, that of the inverse-mapping theorem. We suppose that 
H(0) = 0 and that dH! exists, and we want to invert H near zero, i.e., solve 
the equation H(y) — ¢ = 0 for y in terms of &. Our theory above tells us that 
the » corresponding to £ will be the fixed point of the contraction K(é, 1) = 
n — T Hy) + T-1(£), where T = dH. In order to make our example as 


234 COMPACTNESS AND COMPLETENESS 4.9 


simple as possible, we shall take H from R? to R? and choose it.so that dH) = I. 
Also, in order to avoid indices, we shall use the mongrel notation x = <x, y>, 
us <U,U>. 
Consider the mapping x = A(u) defined by x= ut v7, y = uF to. 
The Jacobian matrix 
[ 1 ll 
3u? 1 


is clearly the identity at the origin. Moreover, in the expression K(x, u) = 
x -+ u — Hu), the difference H(u) — u is just the function J(u) = <2?, u? >. 
This cancellation of the first-order terms is the practical expression of the fact 
that in forming K(£, ») = » — T~'G(£, ), we have acted to make dK? = Oat 
the “center point” (the origin here). We naturally start the iteration with 
ug = 0, and then our fixed-point sequence proceeds 


u, = K(x, uo) = K(x, 0),-.., un = A(x, un_). 


Thus ug = Oand u, = K(x, un_)) = x — J (an_}), giving 


Uu= of, y= Y, 
Uy =a — zy’, veg = y — 23, 

z 2,3 
ug = xz — (y — 2°)’, vgs y— (ea y*)’, 


Al. 


uy =z — [y — fz — y’) Dl ot 


%™=y—[kx-—Y-Ft 
We are guaranteed that this sequence u, will converge geometrically provided 
the starting point x is close enough to 0, and it seems clear that these two 
sequences of polynomials are computing the Taylor series expansions for the 
inverse functions u(x, y) and v(x, ¥). We shall ask the reader to prove this in an 
exercise. The two Taylor series start out 


ue, y) = 2 — y? — dyx® +---, 
v(x, y) = y — 2 + B22y? + «> 


EXERCISES 


9.1 Let B be a compact subset of a normed linear space such that rB C B for all 
7€(0,1}). Suppose that #: B— B is a Lipschitz mapping with constant 1 {Le., 
||F(£) — F(y)t| < ||E — nil for all £,47 EB). Prove that F has a fixed point. [Hint: 
Consider first G = rF for 0 <r < E 


9.2 Give an example to show that the fixed point in the above exercise may not be 
unique. 

9.3 Let X be a compact metric space, and let K: X — X “reduce each nonzero 
distance”. That is, o(K(x), K(y)) < p(a,y) if x = y. Prove that K has a unique 
fixed point. (Show that otherwise glb {o{K(z),z)} is positive and achieved as a 
minimum. Then get a contradiction.) 


9.4 Let K bea mapping from S X X to X, where X is a complete metric space and S$ 
is any metric space, and suppose that K(s, 2) is a contraction in s uniformly over z and 


4.9 THE CONTRACTION MAPPING FIXED-POINT THEOREM 235 


is Lipschitz continuous in x uniformly over x. Show that the fixed point p, is a Lipschitz 
continuous function of s. {Hint: Modify the ¢,é-beginning of the proof of Corollary 
4 of Theorem 9.1.] 


9.5 Let D be an open subset of a Banach space V, and let K: D — V be such that 
I — K is Lipschitz with constant $. 
a) Show that if Bea) C Dandf = K(a), then B,2(8) C K[D]. (Apply a corollary 
of the fixed-point theorem to a certain simple contraction mapping.) » 
b) Conclude that K is injective and has an open range, and that K~! is Lipschitz 
with constant 2. 


9.6 Deduce an improved version of the result in Exercise 3.20, Chapter 3, from the 
result in the above exercise. 


9.7 In the context of Theorem 9.3, show that d@%,.,»> is invertible if ||€K%.>|| <1. 
(Do not be confused by the notation. We merely want to know that S is invertible if 
|F — Poe S|] < 1.) 

9.8 There tsa slight discrepancy between the statements of Theorem 11.2 in Chapter 
3 and Theorem 9.3. In the one case we assert the existence of a unique continuous 
mapping from a ball Jf, and in the other case, from the ball Jf to the ball ¥. Show 
that the requirement that the range be in N can be dropped by showing that two 
continuous solutions must agree on Af. (Use the point-by-point uniqueness of 
Theorem 9.3.) 

9.9 Compute the expression for dF, from the identity G(£, F(#)) = 0 in Theorem 
9.4, and show that if K is continuously differentiable, then all the maps involved in the 
solution expression are continuous and that at dF, is therefore continuous. 

9.10 Going back to the example worked out at the end of Section 9, show by induction 
that the polynomials uw, — u,—1 and v, — v»—1 contain no terms of degree less than xn. 
9,11 Continuing the above exercise, show therefore that the power series defined by 
taking the terms of degree at most n from u, is convergent in a ball about 0 and that its 
sum is the first component w(z, y) of the mapping inverse to H. 

9,12 The above conclusions hold generally. Let / = <K,L> be any mapping 
from a ball about 0 in R? to R? defined by the convergent power series 


K(a,t) = Dajey’, Lea, y) = Do bijaty? 
in which there are no terms of degree 0 or 1. With the conventions x = <z,¥> and 
u = <u,v>, consider the iterative sequence 
uo = 0, Uy, = x — J(u,g-1). 


Make any necessary assumptions about what happens when one power series is sub- 
stituted in another, and show by induction that u, — u,—1 contains no terms of 
degree less than x, and therefore that the u, define a convergent power series whose 
sum is the function u(z, y) = <ufz, y), v(x, y) > inverse to H in a neighborhood of 9. 
{Remember that J/(y) = Hin} — 7] 


9,13 Let A be a Banach algebra, and let x be an element of A of norm less than 1. 
Show that 


(e—2)7' = Il at x, 
t=1 


236 COMPACTNESS AND COMPLETENESS 4.10 


This means that if m, is the partial product [[7 (i + 275, then am — (@ — 2)7). 
[Hint: Prove by induction that (e — x)ma—, = e — 2?"] 

This is another example of convergence at an exponential rate, like Newton's 
method in the text. 


10. THE INTEGRAL OF A PARAMETRIZED ARC 


In this section we shall make our final application of completeness. We first. 
prove @ very general extension theorem, and then apply it to the construction of 
the Riemann integral as an extension of an elementary integral defined for step 
functions. 


Theorem 10.1, Let U be a subspace of a normed linear space V, and let 7’ 
be a bounded linear mapping from U to a Banach space W. Then T has a 
uniquely determined extension to a bounded linear transformation S from 
the closure U to W. Moreover, |[S|} = |T']. 


Proof. Tix a€ U and choose {t,} C U so that &, > a. Then {£,} is Cauchy and 
{T(é,}} is Cauchy (by the lemmas of Section 7), so that {7(é,)} con- 
verges to some 8 € W. If {a} is any other sequence in U converging to a, 
then £n — 4, 2 0, T(t.) — Tian) = TlEn — 22) 7 0, and so T(n,} — B also. 
Thus @ is independent of the sequence chosen, and, clearly, 8 must be the valuc 
S(a) at « of any continuous extension § of 7. Ife € U, then 8 = lim T(a,) = 
T(x) by the continuity of 7. We thus have S uniquely defined on U by the 
requirement that it be a continuous extension of T. 

It remains to be shown that S is linear and bounded by ||T||. For any a, 6 € U 
we choose {én}, {nn} CU, so that &, > a and y, > 8. Then x&, -}+- yn, > 
zt + yn, so that 


S(va + y8) = lim Text, + yan) = x lim T(E) + y lim P(q,) = eS(@) + yS(). 
Thus S is linear. Finally, 

\|Se)|| = lim |PC(e,)|] < WFI| lim [Eni] = ITI lle. 
Thus ||7'|| is a bound for S, and, since S includes T, |S} = ||7|j. 0 


The above theorem has many applications, but we shall use it only once, to 
obtain the Riemann integral {? f(t) dt of a continuous function f mapping a 
closed interval {a, b] into a Banach space W as an extension of the trivial integral 
for step functions. If W is a normed linear space and f: [a,b] —> W is a con- 
tinuous funetion defined on a closed interval [a, 6] C R, we might expect. to be 
able to define Ws J) dt as a suitable vector in W and to proceed with the integral 
calculus of vector-valued functions of one real variable. We haven’t done this 
until now beeause we need the completeness of W to prove that the integral 
exists! 

At first we shall integrate only certain elementary functions called step 
functions. A finite subset A of [@, 6] which contains the two endpoints @ and b 


4.10 THE INTEGRAL OF A PARAMETRIZED AKC 237 


will be called @ partition of [@, b). Thus A is (the range of) some finite sequence 
{t}5, wherea = fg < ty < +--+ < t, = b, and A subdivides (a, b] into a sequence 
of smaller intervals. To be definite, we shall take the open intervals (¢;_;, t,), 
i= 1,...,, as the intervals of the subdivision. If A and B are partitions and 
A CB, we shall say that B is a refinement of A. Then each interval (s;_1, 8;) of 
the B-subdivision is included in an interval {t;_, ¢:) of the A-subdivision; ¢;_1 
is the largest element of A which is less than or equal to s;_ 1, and ¢,; is the smallest. 
greater than or equal to s;. A step function is simply a map f: [@¢, }] ~ W which 
is constant on the intervals of some subdivision A = {t;}}. Fhat is, there exists 
a sequence of vectors {a;}7 such that f(f) = a; when & € (&_4,¢,). The values 
of f at the subdividing points may be among these values or they may be different. 

For each step function f we define i? ff) dt as 7%, a; At;, where f = a; on 
(t;1, #;) and At; = #; — t;,. If f were real-valued, this would be simply the 
sum of the areas of the rectangles making up the region between the graph of f 
and the t-axis. Now f may be described as a step function in terms of many 
different subdivisions. For example, if f is constant on the intervals of A, and 
if we obtain B from A by adding one new point s, then f is constant on the 
(smaller) intervals of B. We have to be sure that the value of the integral of f 
doesn’t change when we change the describing subdivision. In the case just 
mentioned this is easy to see. The one new point. s lies in some interval (£;_4, 4), 
defined by the partition A. The contribution of this interval to the A-sum is 
a;(f; — t;,), while in the B-sum it splits into a,(t; — s) + as — t;_}). But 
this is the same vector. The remaining summands are the same in the two sums, 
and the integral is therefore unchanged. In general, suppose that f is a step 
function with respect to A and also with respect to C. Set B= A UC, the 
“common refinement” of A and C. We can pass from A to B in a sequence of 
steps at each of which we add one new point. As we have seen, the integral 
remajns unchanged at, each of these steps, and so it is the same for A as for B. 
It is similarly the same for C' and B, and so for A and C. We have thus shown 
that fe f is independent of the subdivision used to define f. 

Now fix [a, b] and W, and let & be the set of all step functions from [a, 6] 
to W. Then &is a vector space. For, if fand g in & are step functions relative to 
partitions A and B, then both functions are constant on the intervals of C = 
AUB, and therefore zf + yg is also. Moreover, if C = {#;}5, and if on (;_), i) 
wehavef = a;andg = 6;,so that af + yg = ra; + 78; there, then the equation 


> (va; + 8s) At; = x (= ay at.) + (= B: ats) 
; I 1 


z=1 


is just fe (af ty) = 2 for +y Je g. The map fre {oF is thus lincar from & to 


W. Finally, 
a 
| / j| 
a | 


2 ay a6 < py» le,|| At; < Sted = a), 


238 COMPACTNESS AND COMPLETENSSS 4.10 


where |{f{l.. = lub {{[f(]| : ¢ € [a, b]} = max {lle,|]:1 <7 <n}. That is, if 
we use on & the uniform norm defined from the norm of W, then the linear 
mapping f+> f? f is bounded by (b — a). Lf W is complete, this transformation 
therefore has a unique bounded linear extension to the closure & of & in 
®({[a, b], W) by Theorem 10.1. But we can show that & includes the space 
€((a, b], W) of all continuous functions from [a, b] to W, and the integral of a 
continuous function is thus uniquely defined. 


Lemma 10.1. @((a, 5], W)C &. 


Proef. A continuous function f on [e, 6) is uniformly continuous (Theorem 5.1). 
That is, given e°°, there exists 67° such that |s — é| < 6 = [if(s) —f@I} < e 
Now take any partition A = {t,}5 on [e, 6} such that At; = t; — t;_1 < 4 forall 
i, and take a; as any value of f on (¢;_1, &). Then ||f(t) — agi} < € on [Ej21, &. 
Thus, if g is the step function with value a; on ({t;_,, é;) and g(a) = a4, then 
lf ~ gl. < €. Thus f is in &, as desired. 0 


Our main theorem is a recapitulation. 


Theorem 10.2. If W is a Banach space and V = C([a, b], W) under the 

uniform norm, then there exists a J © Hom(¥, W) uniquely determined by 

setting J(f) = lim {2 f,, where {f,} is any sequence in & converging to f 

and f°? f, is the integral on & defined above. Moreover, lJ|| < (6 — a). 

If fis elementary from [a@, b] to W and c € {a, 6}, then of course f is elementary 
on each of [a, ce] and [e, 6]. If cis added to a subdivision A used in defining f, and 
if the sum defining {? f with respect to B = A U {c} is broken into two sums 
atc, we clearly have f?f = fef+ f?f. This same identity then follows for any 
continuous function f on [a, d], since fer = lim he fa = lim Cle fn + et) — 
lim ff fn +him Pf, = [EF + [OF 

The fundamental theorem of the caleulus is still with us. 

Theorem 10.3. If f © e(fa, 6}, W) and F: fa, b] — W is defined by F(z) = 

JZ § dt, then F’ exists on (a, b) and F’(x) = f(z). 


Proof. By the continuity of f at x9, for every € there exists a 6 such that 
fro) — f(z)|| < € 
whenever [a — xo| < 5. But then 
= 
feo) — 40) al] < ex — xo}, 
x4 


and since {7 f(x) dt = f(«o}(% — 10) by the definition of the integral for an 
elementary function, we see that 


| feo) — Cf ; H() ai/(x — 20))} < 


Since rh f(t} dt = F(x) — F(z), this is exactly the statement that the differ- 
ence quotient for F converges to f(x), as was to be proved. 0 


4.10 THE INTEGRAL OF A PARAMETRIZED ARC 239 
EXERCISES 


10.1 Prove the following analogue of Theorem 10.1. Let A be a subset of a metric 
space B, let C be a complete metric space, and let F: A — C' be uniformly continuous, 
Then F extends uniquely to a continuous map from 4 to C. 


106.2 In Exercises 7.16 through 7.18 we have constructed a completion of S$, namely, 
[5] in V*. Prove that this completion is unique to within isometry. That is, supposing 
that y is some other isometric imbedding of S in a complete space X, show that the 
identification of the two images of S by ¢o 6! (from @[S] to ¢g[S}} extends to an 
isometric bijection from 6[S] to ¢g[S]. [Hint: Apply the above exercise,} 


16.3 Suppose that S is a normed linear space X and that X is a dense subset of a 
complete metric space Y. This means, remember, that every point of Y is the limit of a 
sequence lying in the subset X. Prove that the vector space structure of XY extends in a 
unique way to make Y a Banach space. Since we know from Exercise 7.18 that a metric 
space can be completed, this shows again that a normed linear space can always be 
completed to a Banach space. 


10.4 In the elementary calculus, if f is continuous, then 
b 
[ fO4 = 100-4 
a 


for some x in (a, 6). Show that this is not true for vector-valued continuous functions f 
by considering the are f: (0, +] ~ R* defined by 
J® = <sint,cost>. 


10.5 Show that integration commutes with the application of linear transformations. 
That is, show that if fis a continuous function from [@, 4] to a Banach space IV, and if 
T © Hom(W', X), where X is a Banach space, then 


[ 700) dt = rif so at}. 


(Hint: Make the computation directly for step functions] 
10.6 State and prove the theorem suggested by the fullowing identity: 


[ <t0,00> a < [104 [ oa’. 


(Apply the above exercise.) 


10.7 Let W be any normed linear space, {a:}7a finite set of vectors in 1’, and 
{fj} ® corresponding set of real-valued continuous functions on [a,b]. Define the 
are ¥ by 


¥(t) = L Flas, 


Prove that f? ¥(2) dt exists and equals 


SL f sole. 


240 COMPACTNESS AND COMPLETENESS 4.11 


10.8 Let f be a continuous function from IR? to a Banach space W. Desertbe how 
one might set up a theory of a double integral 


if if f(s, t) ds dt, 


xt 
where I * J is a closed rectangle. 
10.9 Prove that if f, converges uniformly to f, then 


fs ar [90 a 


This is trivial if you have understood the definition and properties of the integral. 


10.10 Suppose that {f,} is a sequence of smooth ares from [a, 5] to a Banach space IV’ 
such that “2 f2(2) is uniformly convergent. Suppose also that 3°? f,(a) is convergent. 
Prove that then > f,{¢) is uniformly convergent, that f = >"? f, is smooth, and that 
f' = TA. (Use the above exercise and the fundamental theorem of the calculus.) 


10.1L Prove that even if W is net a Banach space, if the are f:[a, 5] —~ W has a 
continuous derivative, then J? f’ exists and equals f(b) — f(a). 


10.12 Let X be a normed linear space, and set (7, £) = {(£) for £& X and ie X*. 
Now let f and g be continuously differentiable functions (arcs) from the closed interval 
(a, b] to X and X*, respectively. Prove the integration by parts formula: 


(a(t), 400) ~ (ota), sa) = f° YO, do) a+ f°", o10) a 


[Hint: Apply Theorem 8.4 from Chapter 3.] 


10.13 State the generalization of the above integration by parts formula that holds 
for any bounded bilinear mapping w: V * W—- X, where X is a Banach space. 


10.14 Letitr i; bea fixed continuous map from a elosed interval [a, 4] to the dual W* 
of a Banach space If. Suppose that for any continuous map g from [a, b] to W 


[ gDdt=0=> [ Mow) at = 0. 


Show that there exists a fixed L © W™ such that 


f " 1Ag(t)) dt = b ( f ” ott) at) 


for all continuous ares g:[a, 6] ~ W. Show that it then follows that 2, = ZL for all. 
10.15 Use the above exercise to deduce the general Euler equation of Section 3.15. 


lt. THE COMPLEX NUMRER SYSTEM 


The complex number system C is the third basic number field that must be 
studied, after the rational numbers and the real numbers, and the reader surely 
has had some contact with it in the past. 

Almost everybody views a complex number £ as being equivalent to a pair 
of real numbers, the “real and imaginary parts” of £, and the complex number 
system C is thus viewed as being Cartesian 2-space R®? with some further struc- 


4.11 THE COMPLEX NUMBER SYSTEM 241 


ture. In particular, a complex-valued function is simply a certain kind of vector- 
valued function, and is equivalent to an ordered pair of real-valued functions, 
again its real and imaginary parts. 

What. distinguishes the complex number system C from its vector substratum 
R? is the presence of an additional operation, complex multiplication. The 
vector operations of R* together with this complex multiplication operation 
make C into a commutative algebra. Moreover, it turns out that <1,0> is the 
unique multiplicative identity in C and that every nonzero complex number £ 
has a multiplicative inverse. These additional facts are summarized by saying 
that C is a field, and they allow us to use C as a new scalar field in vector space 
theory. In fact, the whole development of Chapters 1 and 2 remains valid when 
R is replaced everywhere by C. Scalar multiplication is now multiplication by 
complex numbers. Thus €” is the vector space of ordered n-tuples of complex 


numbers < £1,..., , >, and the product of an n-tuple by a complex scalar a 
is defined by a<£,...,&,> = <at,,...,a&,>, where a; is complex 
multiplication. 


It is time to come to grips with complex multiplication. As the reader prob- 
ably knows, it is given by an odd looking formula that is motivated by thinking 
of an element = <2), 22> as being in the form x, + ¢xo, where 7? = —1, 
and then using the ordinary laws of algebra. Then we have 


fn == (4, + to) (41 + tye) 
= ey > iyo + oy + Preys = (ey — Foe) + Hriye + 241), 


and thus our definition is 
<2y,%2> <Y1, Yo? = <2iYi — LoYe, LiY2 + Veyi>. 


Of course, it has to be verified that this operation is commutative and satisfies 
the laws for an algebra. A straightforward check is possible but dull, and we 
shall indicate a neater way in the exercises. 

The mapping x +> <z,0> is an isomorphic injection of the field R into the 
field C. It clearly preserves sums, and the reader can check in his mind that it 
also preserves products. It is conventional to identify x with its image <z,0>, 
and so to view R as a subfield of C. 

The mysterious 7 can be identified in C as the pair <0,1>, since then 
a= <0,1><0,1> = <—1,0>, which we have identified with —1. With 
these identifications we have <2z,y> = <x,0> + <O,y> = <2,0> + 
<0,1><y,0> = r+ iy, and this is the way we shall write complex numbers 
from now on. 

The mapping «+ iy x — ty is a field isomorphism of C with itself. 
That is, it preserves both sums and products, as the reader can easily check. 
Such a self-isomorphism is called an automorphism. The above automorphism is 
called complex conjugation, and the image x — iy of ¢ = «+ ty is called the 
conjugate of ¢, and is designated ¢. We shall ask the reader to show in an exercise 


242 COMPACTNESS AND COMPLETENESS 4.1] 


that conjugation is the oxly automorphism of C (except the identity automor- 
phism) which leaves the elements of the subfield R fixed. 

The Euclidean norm of ¢ = «+ ty = <z,y> is called the absolute value 
of ¢, and is designated |f|, so that {¢{ = Jz + zy] = (2? + y*)/*, This is 
reasonable because it then turns out that [vy] = [¢} fy]. This can be verified by 
squaring and multiplying, but il is much more elegant first to notice the relation- 
ship between absolute value and the conjugation automorphism, namely, 

cf = iel? 
[(z+ ty) (e — ty) = 2? — Cy)? = 2? + y?). Then [Ey]? = (ON) G7) = (CHO) = 
c!*\v]?, and taking square roots gives us our identity. The identity ¢f = |¢}? 
also shows us that if ¢ # 0, then &/|¢|? is its multiplicative inverse. 

Because the rea] number system R is a subfield of the complex number 
system C, any vector space over C is automatically also a vector space over R: 
multiplication by complex scalars includes multiplication by real scalars. And 
any complex Hnear transformation between complex veetor spaces is auto- 
matically real linear. The converse, of course, does not hold. Tor example, 2 
real linear mapping T from R? to R? is not in general conyplex linear from € to C, 
nor does a real linear S in Hom R* become a complex linear mapping in Hom C? 
when R?4 is viewed as C?. We shall study this question in the exercises. 

The complex differentiability of a mapping F between complex vector spaces 
has the obvious definition AF, = T+ 0, where 7 is complex linear, and then 
F is also real differentiable, in view of the above remarks. But # may be real 
differentiable without being complex differentiable. It follows from the dis- 
cussiun at the end of Section 8 that if {a,} C Cand {{a,|6"} is bounded, then the 
series } a,¢” converges on the ball B;(0) in the (real) Banach algebra C, and 
Fig) = X90 ang” is real differentiable on this ball, with dFa(¢) = (CC? nenB™)¢ = 
F’(g)- ¢ But multiplication by F’(8) is obviously a complex linear operation 
on the one-dimensional complex vector space C. Therefore, complex-valued 
functions defined by conyergent complex power series are automaticaily com- 
plex differentiable. But we ean go even further. In this case, if ¢ # 0, we can 
divide by ¢ in the defining equation 


AF3(¢) = F'(8) £ + 0) 


to get the result that 
ara) — F'(8) as §3 0. 


That is, F'’(8) is now an honest derivative again, with the complex infinitesimal ¢ 
in the denominator of the difference quotient. 

The consequences of complex differentiability are incalculable, and we shall 
mostly leave them as future pleasures to be experienced in a course on functions 
of complex variables. See, however, the problems on the residue calculus at the 
end of Chapter 12 and the proof in Chapter 11, Exercise 4.3, of the following 
fundamental theorem of algebra. 


4.1] THE COMPLEX NUMBER SYSTEM 243 


Theorem, Every polynomial with complex coefficients is a product of 
linear factors. 


A weaker but equivalent statement is that every polynomial has at least. one 
(complex) root. The crux of the matter is that +7 + 1 cannot be factored over R 
(i.e., it has no real root), but over C we have x? + 1 = (2+ i)(x -- 4), with the 
two roots + 2. 

For later use we add a few more words about the complex exponential func- 
tion expt = e& = Leer/ni. If f — 2+ ty, we have ef = e* T¥ = e*e™, and 
ef = 5 iy)" /nt = (1 — 2! + yl —--) +iy—y ait y/sl—--) = 
cos y+ ésin y. Thus ¢?*” = e7(cos y+ 7sin y). That is, the real and imaginary 
parts of the complex-valued funetion exp (# + 7y) are e* cosy and e7 sin y, 
respectively. 


EXERCISES 


11.1 Prove the associativity of complex multiplication directly from its definition. 
11.2 Prove the distributive law, 
a(é+ 9) = a+ on, 
for complex numbers. 

11,3 Show that scalar multiplication by a real number a, a<z,y> = <az,ay>,in 
C = R? is consistent with the interpretation of @ as the complex number <@,0> and 
the definition of complex multiplication. 

11.4 Let 6 be an automorphism of the complex number field leaving the real numbers 
fixed. Prove that @ is either the identity or complex conjugation. [Hint: (@())}? = 
6(¢7) = 6{—1) = —1. Show that the only complex numbers z + 7y whose squares are 
—1 are +#, and then finish up.] 

11.8 If weremember that C is in particular the two-dimensional real vector space R2, 
we see that multiplying the elements of C by the complex number a + 7 must define 
a linear transformation on R?. Show that its matrix is 


a —b 
b a 
11.6 The above cxercise suggests that the complex number system may be like the 
set A of all 2K 2 real matrices of the form 


[¢ =] 


Prove that A is a subalgebra of the matrix algebra R?*? (that is, A is closed under 
multiplication, addition, and scalar multiplication) and that the mapping 


a —b ; 
f iJoate 


is a hijection from .1 to C that preserves all algebra operations. We therefore can 
conclude that the laws of an algebra automatically hold for C. Why? 


244 COMPACTNESS AND COMPLETENESS 4.1] 


11.7 In the above matrix model of the complex number system show that the abso- 
lute value identity |¢¥] = |¢| [y| is a determinant property. 


11.8 Let W be a real vector space, and let V be the real vector space WX W. 


Show that there is a @ in Hom V¥ such that 6? = —J. (Think of C as being the real 
vector space R? = R x R under multiplication by 7.) 
11.9 Let ¥ bea real vector space, and let @ in Hom V satisfy 6? = —Z. Show that 


becomes a complex vector space if ta is defined as @(a). If the complex vector space V 
is made from the real vector space HW’ as in this and the above exercise, we shall call 
V the complezification of I. We shall regard W itself as being a real subspace of V 
(actually IV x {0}), and then V = W @ iv. 

11.10 Show that the complex vector space C* is the complexification of R". Show 
more generally that for any set .l the complex vector space C“ is the complexifieation 
of the real vector space RA. 

11.11 Let V be the complexification of the rea] vector space W. Define the opcration 
of complex conjugation on V. That is, show that there is a rea] lmear mapping ¢ such 


that g? = 7 and glia) = —ip(a). Show, conversely, that if V is a complex vector 
space and ¢ is 4 conjugation on V {a real Jinear mapping y such that g? = J anil 
y(ta) = —ipla)}, then V is (isomorphic to} the complexification of a real linear space 


W. (Apply Theorem 5.5 of Chapter 1 to the identity y? — J = 0.) 

11.12 Let W be a rea! vector spave, and let V be its complexification. Show that. 
every Tm Hoin I “extends” to a complex linear S in Hom V which commutes with the 
conjugation y. By S extending J’ we mean, of course, that S [ (II & {0}) = T. 
Show, conversely, that if S in Hom V commutes with conjugation, then 8 is the 
extension of a Tin Hom I’. 

11.23 In this situation we naturally call S the complexification of 7. Show finally 
that if S is the complexification of 7, then its null space X in V is the direct sum 
X = N @1N, where N is the null space of T in 1. Remember that we are viewing V’ 
as W @ al. 

11.14 Ons complex normed linear space V the norm is required to be complex homo- 
geneous: 


I[Ncell = Af > feel 


for all complex numbers 4. Show that the natural definitions of | fli, f {l2, and J {l. 
on C* have this property. 

11.15 If areal normed linear space W’ is complexified to ¥ = W' @ <li’, there is no 
trivial formula which converts the rea] norm for IV inte a complex nerm for V. Show 
that, nevertheless, any product norm on V (which really is 11’ X JW) can be used to 
generate an equivalent complex norm. [Hint: Given < &»> € V, consider the set af 
numbers {|\(2 + iy}<é, 9 >||:|x-+ ty] = 1}, and try to obtain from this set a single 
number that works.] 

11.16 Show that every nonzero complex number has 4 logarithm. That is, show thal. 
ifu + i = 0, then there exists an xz + ty such thate7*'* = u+ ww. (Write the equation 
e7(cos y+ ¢sin y) = ut dv, and solve by being slightly clever.) 

11.17 The fundamental theorem of algebra and Theorem 5.5 of Chapter 1 imply 
that if V is a complex vector space and T in Hom Y satisfies p(T) = 0 for a polynomial 


4.12 WEAK METHODS 245 
p, then there are subspaces {V¥;) 7 of V, complex numbers {\,}7, and integers {2,} T such 
that V = @iV,, V; is T-invariant for each, and (7 — »;FY"s = 0 on V; for each 7. 
Show thal this isso. Show also that if V is finite-dimensional, then every Tm Hom V 
must satisfy some polynomial equation p(t) = 0. (Consider the linear independence or 
dependence of the vector J, 7, T?,..., 7*”,..., in the vector space Hom V.) 
11.18 Suppose that the polynomial p in the above exercise has rea] coefficienis. Use 
the fact that complex conjugation is an automorphism of C to prove that if \ is a root 
of p, then so is A. 

Show that if V is the complexification of a real space IV’ and T ts the complexifica- 
tion of & © Hom W, then there exists a real polynomial p such that p(T} = 0. 
11.19 Show that if W is a finite-dimensional real vector space and R € Hom W is an 
isomorphism, then there exists an .{ € Hom W such that R = e4 (that is, log R exists). 
This is a hard exercise, but it can be proved from Exercises 8.19 through 8.23, 11.12, 
11.17, and 1£.18. 


"12. WEAK METHODS 


Our theorem that all norms are equivalent on a finite-dimensional space suggests 
that the limit theory of such spaces should be accessible independently of norms, 
and our earlier theorem that every linear transformation with a finite~dimen- 
sional domain js automatically bounded reinforces this impression. We shal] look 
into this question in this section. In a sense this effort is irrelevant, since we 
can’t do without norms completely, and since they are so handy that we use 
them even when we don’t have to. 

Roughly speaking, what we are going to do is to study a vector-valued map F 
by studying the whole collection of real-valued maps (lo PF: le V*}. 


Theorem 12.1. If V is finite-dimensional, then é, — £in V (with respect to 
any, and so every, norm) if and only if 2(¢,} — 2() in R for each ? in F*. 


Proof. If §, — & and le V*, then /(é,) — i{£}, since / is automatically con- 

tinuous. Conversely, if I(£,) — {(£) for every lin V*, then, choosing a basis 

{8:}7 for V, we have €;(§,) — €;(£) for each functional e,; in the dual basis, and 

this implies that §,-— & in the associated one-norm, since |!&, — él, = 
t lelEn) — €:(8)] 7 0. 0 


Remark. If V is an arbitrary normed linear space, so that V* = Hom(V, R) 
is the set of bounded linear functionals, then we say that &, — § weakly if 
i{E,) — i(£) for each ie V*, The above theorem ean therefore be rephrased to 
say that in a finite-dimensional space, weak convergence and norm convergence are 
equivalent notions. 


We shall now see that in a similar way the integration and differentiation of 
parametrized arcs ean all be thrown back to the standard calculus of real-valued 
functions of a real variable by applying functionals from V* and using the 
natural isomorphism of V** with V. Thus, if f € @([a, d], V) and A & ¥*, then 


246 COMPACTNESS AND COMPLETENESS 4.12 


ho fe ea, b], R), and so the integra] 2 do f exists from standard calculus. If wi 
vary \, we can check that the map A+ [? do f is linear, hence is in V**, and 
therefore is given by a uniquely determined vector a&V (by duality; sec 
Chapter 2, Theorem 3.2.). That is, there exists a unique a € VY such that 
(a) = i Xo f for every \ € V*, and we define this a to be haa Thus integra 
tion is defined so as to commute with the application of linear functionals 
fé f is that veetor such that 


r([-A) = fo Mg@) a forall AEG. 


Similarly, if all the real-valued functions {\ of: € V*} are differentiabl: 
at 2o, then the mapping A +> (A © f)’ (29) Is linear by the linearity of the dcrivativ- 
in the standard caleulus: 


((eyd1 + €2d9) of)’ = (e,€A1 of) + colde of)’ = e100 © Sf)’ + colds o fy. 
Therefore, there is again a unique a € V such that 
(A ° f)’@o} = Ala) forall AEYV*, 


and if we define this « to be the derivative f’(.c9), we have again detined an oper- 
ation of the calculus by commutativity with linear functionals: 


(Xo f") (to) = Ce f)’(r0). 


Now the fundamental! theorem of the calculus appears as follows. 

If F(x) = fz f, then (\° F)(x) = J2 of by the weak definition of the 
integral. The fundamental theorem of the standard calculus then says that 
(\ c F) exists and (\° FY (2) = (v9 f)(z) = A(f(x)). By the weak definition ot 
the derivative we then have that F’ exists and F’(x) = f(x). 

The one conelusion that we don’t get so easily by weak methods is the norm 
inequality ||f2f|| < (6 ~ @l fll. This requires a theorem about norms on 
finite-dimensional spaces that we shall not prove in this course. 


Theorem 12.2. ||a**|| = |le|| for each @ € V. 


What is being asserted is that lub |a**(4)}{/||\|| = lal]. Since a**(A) = Alex). 
and sinee |\(ex)| < |JAj] + [lal] by the definition of ||A||, we see that 


lub [a**(A)[/I[NI]_< [lee]. 


Our problem is therefore to find \ € ¥* with [Al] = Land |A(o}| = |la|;. If we 
multiply through by a suitable constant (replacing « by ca, where ¢ = I/|fall), 
we can suppose that |lal| = 1. Then & is on the unit spherical surface, and the 
problem is to find a funetiona] } € V* such that the affine subspacc (hyperplane) 
where \ = 1 touches the unit sphere at « (so that A(@) = 1) and otherwis- 
lies outside the unit sphere (so that |\(é)}| <1 when [/él| = 1, and henee 
iA|| < 1). It is clear geometrically that such “tangent planes” must exist, bu, 
we shall drop the matter here. 


4.12 WEAK METHODS 247 
If we assume this theorem, then, sinee 
b b 
I Cf A)| =|, MPO) ad <  ~ @) max (AGO): ¢ € fa, OF 


< & — a)|[rif max [||fOl]} (from {r(a}|_< [IAI] - lal) 
=  — add: [file 


CA= ML YIP (Epi < 6 ote 


the extreme members of which form the desired inequalily. 


we get 


I 


CHAPTER 5 


SCALAR PRODUCT SPACKs 


In this short ehapter we shall look into what is going on behind two-norms, and 
we shall find that a wholly new branch of linear analysis is opened up. The: 
norms can be characterized abstractly as those arising fron scalar produc 
They are the finite and infinitedimensional analogues of ordinary geometr:e 
length, and they carry with them practically all the concepts of Euclidean 
geometry, such as the notion of the angle between two vectors, perpendicularil, 
(orthogonality) and the Pythagorcan theorem, and the existence of mans 
rigid motions. 

The impact of this extra structure 1s particularly dramatic for infinite 
dimensional spaces. Infinite orthogonal bases exist in great profusion and can 
be handled about as casily as bases in finite-dimensional spaces, although tl: 
basis expansion of a vector is now a convergent infinite series, & = DOT xe, 
Many of the most important series expansions li mathematics are examples of 
such orthogonal basis expansions. For example, we shall sec in the next chapter 
that the lourier series expansion of a continuous function f on [0, a] is the basi:. 
expansion of f under the two-norm [I ffts — dep for the particular orthog 
onal basis fa,}? = {sin xt} 7. Ifa vector space is cemplete under a scalar product 
norm, it is called a Hilbert space. The more advaneed theory of such spaces i: 
one of the most beautiful parts of mathematics. 


1, SCALAR PRODUCTS 
A sealar product on a real yeetor space V is a real-valued function from V x | 
to R, its value at the pair <&, 9> ordinarily being designated (£, 9), such that 
a) (£, 7) is linear in ~ when y is held fixed; 
b) (9) = Gy &) (symmetry); 
e) (££) > Ot #0 (positive definiteness). 
If (¢} is replaced by the weaker condition 
e’) (¢, & > O forall ge F, 


then CE, 7) is called a semiscalar product. 
Two important examples of scalar products are 


tia 
(x,y) = » LEY when Y = R" 
1 
248 


5.1 SCALAR PRODUCTS 249 


a= fF ode when V = e(a,d)). 


On a complex vector space (b) must be replaced by 


b’) (& 2) = (», £) (Hermitian symmetry), 
where the bar denotes complex conjugation. The corresponding examples are 
(z, w) = DC 2; when V = C” and (f,¢) = il? fg when V is the space of 
continuous complex-valued functions on [a, 6]. We shall study only the real case. 

It follows from (a) and (b) that a semiscalar product is also linear in the 
second variable when the first, variable is held fixed, and therefore is a symmetric 
bilinear functional whose associated quadratic form g(£) = (4, £) is positive 
definite or positive semidefinite [(c) or (e’); see the last section in Chapter 2]. 
The definiteness of the form q has far-reaching consequences, as we shall begin 
1o see at once. 


Theorem 1.1. The Schwarz inequality 
ICE, mi SCE, EY 8(n, a)? 


is valid for any semiscalar product. 


Proof. We have 0 < (£ — tn, & — tm) = (é, £) — 26(£, 9) + €7(n, ») for every 
te R. Since this quadratic in ¢ is never negative, it cannot have distinet roots, 
and the usual (6? — 4aec)-formula implies that 4(£, 7)? — 4(#, H(y, 2) < 0, 
which is equivalent to the Schwarz inequality. 0 


We can also proceed directly. If (9, 9} > 0, and if we set = (£, »)/(n, 9) 
in the quadratic inequality in the first line of the proof, then the resulting 
expression simplifies to the Schwarz inequality. If (y, 9) = 0, then (£, 7) must 
also be 0 (or else the beginning inequality is clearly false for some ¢), and now 
the Schwarz inequality holds trivially. 


Corollary. If (£,%) is a scalar product, then |fé|| = (4, £)'? is a norm. 
Proof 


e+ al}? = (+9, € +9) 
= Nell? + 2¢é, 9) + Lol? < Hell? + 208] Mall + [lal]? (by Schwarz) 
= (fell + flall)?, 
proving the triangle inequality. Also, |jctl| = (c&, ef) ¥? = (eg, ))¥? = 
le| [fl]. O 
Note that the Schwarz inequality [(z, 2)} < ||£|] [[nl[ is now just the state- 
ment that the bilinear functional (&, 7) is bounded by one with respect to the 
sealar product norm. 


A normed linear space V in which the norm is a scalar product norm is 
called a pre-Hilbert space. If V is complete in this norm, it is a Halbert space. 


250 SCALAR PRODUCT SPACES 5.1 


The two examples of scalar products mentioned earlier give us the real explana- 
tion of our two-norms for the first time: 


n 1/2 
ims @ 22) bs eee 
i 


and 
Il2=(f'P) for Fe eta,d) 


are scalar product norms. 

Since the sealar product norm on R®” becomes Euclidean length under « 
Cartesian coordinate correspondence with Euclidean n-space, it is conventional 
to eall R® itself Euclidean n-space E* when we want it understood that the 
sealar product norm is being used. 

Any finite-dimensional space V is a Hilbert space with respect to any scalar 
product norm, because its finite dimensionality guarantees its completeness. 
On the other hand, we shall see in Exercise 1.10 that @([a, b]) is incomplete in the 
two-norm, and is therefore a pre-Hilbert space but not a Hilbert space in this 
norm. (Remember, however, that @([a, 6]} zs complete in the uniform norm 
\|fllo-) It is important to the real uses of Hilbert spaces in mathematics that any 
pre-Hilbert space can be completed to a Hilbert space, but the theory of infinite- 
dimensional Hilbert spaces is for the most part beyond the scope of this book. 

Scalar product norms have in some sense the smoothest possible unit 
spheres, because these spheres are quadratic surfaces. 

It is orthogonality that gives the theory of pre-Hilbert spaces its special! 
flavor. Two vectors « and 8 are said to be orthogonal, written a L 8, if {a, 8) = 0. 
This definition gets its inspiration from geometry; we noted in Chapter 1 that. 
two geometric vectors are perpendicular if and only if their coordinate triples x 
and y satisfy (x, y) = 0. It isan interesting problem to go further and to show 
from the law of cosines (c? = a? + 6? — 2ab cos 6) that the angle @ between two 
geometric vectors is given by {x, y) = ||x|{ [ly] cos 6. This would motivate us 
to define the angle #@ between two vectors £ and » in a pre-Hilbert space by 
(&, 2) = || El] |/z|| cos @, but we shall have no use for this more general formu- 
lation. 

We say that two subsets A and B are orthogonal, and we write A 1 8B, if 
aL 8 for every win A and # in B; for any subset A weset At = {6B EV: 6 L A}. 


Lemma 1.1. [If § is orthogonal to the set A, then § is orthogonal to L(.4), the 
closure of the linear span of A. It follows that B+ is a closed subspace fur 
every subset B. 


Proof. The first assertion depends on the linearity and continuity of the scalar 
product in one of its variables; it will be left to the reader. As for A = B+, it 
includes the closure of its own linear span, by the first part, and so is a closed 
subspace. 0 


5.1 SCALAR PRODUCTS 251 


Lemma 1.2, In any pre-Hilbert space we have the parallelogram law, 
lle + Bl]? + llx — all? = 2¢llall? + [lai?), 
and the Pythagorean theorem, 
ats  ifandonlyif fla+ all? = flall?+ tigll?. 


If {a,}{ is a (pairwise) orthogonal collection of vectors, then 


i | 2 n 
Is oa = 2 llesll?. 

1 1 
Proof. Since lla + sll® = |la||* + 2(a, 8) + ||s||?, by the bilinearity of the 
scalar product |fa + s{i?, we see that |la-+ sil? = flail? + ||s||? if and only if 
(a, 8} = 0, which is the Pythagorean theorem. Writing down the similar 
expansion of ||a — 8i|° and adding, we have the parallelogram law. The last 
statement follows from the Pythagorean theorem and Lemma 1.1 by induction. 
Or we can obtain this statement directly by expanding the sealar product on the 
left and noticing that all “mixed terms” drop out by orthogonality. 0 


The reader wil} notice that the Schwarz inequality has not been used in this 
Iemma, but it would have been silly to state the lemma before proving that 
ile] = (&, €)"/? is a norm. 

If {a;}} are orthogonal and nonzero, then the identity {07 c.e,{|? = 

} x2 ile,||? shows that [3 za; ean be zero only if all the coefficients 2; are zero. 
Thus, 


Corollary. A finite collection of (pairwise) orthogonal nonzero vectors is 
independent. Similarly, a finite collection of orthogonal subspaces is 
independent. 


EXERCISES 


1.1 Complete the second proof of Theorem 1.1. 
1.2 Reexamine the proof of Theorem 1.1 and show that if — and 7 are independent, 
then the Schwarz inequality is strict. 
1,3 Continuing the above exercise, now show that the triangle inequality is strict 
if £ and y are independent. 
1.4 a) Show that the sum of two semiscalar products is a semiscalar product. 
b) Show that if (zu, vy) is a semiscalar product on a vector space IV and if T is a 
linear transformation from a vector space V to W, then [&, 9] = (Pz, Tv) is a 
semiscalar product on V. 
¢) Deduce from (a) and (b) that 


b 
(S.9) = fag(a)+ f soo a 


is a semiscalar product on V = €@!((a, 8]). Prove that it is a scalar product. 


pay | 


252 SCALAR PRODUCT SPACES 


1.5 If ais held fixed, we know that f(€ = (£, a) iseontinuous. Why? Prove mau 
genorally that (£, ) is continuous as a map from V X ¥ to R. 

1.6 Let V be a two-dimensional Hilbert space, and let fe, @2} be any basis for | 
Show that 4 sealar product (£, 9) has the form 

(& 9) = axiys + b(zry2 + toyi) + cxeye, 
where b? < ac, Here, of course, £ = 21a) + x2@2, 9 = yar + yeas. 

1.7 Prove that if w(x, y¥) = axyyy + b(ziy2 + xeyr) + exeye and 5? < ac, then uv 
is a scalar product on R?. 

1.8 Let w(f, 2) be any symmetric bilinear functional on a finite-dimensional vecla 
space 1’, and let g(f}) = w(t, & be its associated quadratic form. Show that for ain 
choice of a basis for V the equation q(€) = 1 becomes a quadratic equation in th 
eoordinates {a,) of & 

t.9 Prove in detail that if a vector @ is orthogonal to a set A in a pre-Hilbert spac. 

then £ is orthogonal to L(A). 
1.10 We know from the last chapter that the Riemann integra] is defined for the sei 
& of uniform limits of real-valued step functions on (0, 1] and that & includes all the 
continuous functions. Given that & is the step funetion whose value is 1 on [0, 3] and 
0 on [, 1], show that ||f — lle > 0 for any continuous function f. Show, however. 
that. there is a sequence of continuous funetions {f,) such that If, — &llz2 —~ 0. Show. 
therefore, that @([0, 1]) is incomplete in the two-norm, by showing that the above 
sequence (fy, is Cauchy but not convergent in ©({0, 1)}. 


2. ORTHOGONAL PROJECTION 


One of the most important devices in geometric reasoning is “dropping a per- 
pendicular” from a point to a line or a plane and then using right triangle 
arguments. ‘This device is equally important in pre-Hilbert space theory. If A/ 
is a subspace and a@ is any element in V, then by “the foot of the perpendicular 
dropped from a to 4” we mean that vector » in M such that (a — yw) L Al, 
if such a » exists. (See Fig. 5.1.) Writing @ as w+ {a — py), we see that the 
existence of the “foot” win Af for each a in V is equiv- 
alent to the direct sum decomposition V = M @ M*. 
Now it is precisely this direct sum decomposition 


that the completeness of a Hilbert space guarantees, Pee 
as we shall shortly see. We start by proving the 0 a 
geometrically intuitive fact that u is the foot of the . Ay 


perpendicular dropped from « to M if and only if u 


is the point in Af closest to a. a 


Lemma 2.1. If « is in the subspace MW, then (a — yw) L M if and only: if y is 
the unique point in Mf closest to a, that is, u is the “best approximation” tu 
ain M. 


Proof. If (a — w) LM and ¢ is any other point in M, then fla — &[? = 
Wea — w) + Ge — &)l|? = lle — wll? + Ile — él? > la — wll? Thus w is the 


5.2 ORTHOGONAL PROJECTION 253 


unique point in M closest to a. Conversely, suppose that » is a point in AY closest 
to a, and let £ be any nonzero veetor in M. Then |la — pil? < ||(a — 2) + té||?, 
which becomes 0 < 2é(a@ — p, £) + £7]/ Ei]? when the right-hand scalar product is 
expanded. This ean hold for aii t only if (@ — x, £) = 0 (otherwise let ¢ = 7). 
Therefore, (a — ») LM. O 


On the basis of this lemma it is clear that a way to look for » is to take a 
sequence 4, in Af such that {le — yall > e(«, M) and to hope to define y as its 
limit. Here is the crux of the matter: We can prove that such a sequence {y,} is 
always Cauchy, but its limit may not exist if Af is not complete! 


Lemma 2.2, If {u,} is a sequence in the subspace AY whose distance from 
some vector a converges to the distance p from a to M, then {z,,} is Cauchy. 


Proof. By the parallelogram law, lhitn Bll? = |(@ — pa} — (a — Bn) (|? os 
2(\je — wall? + ila — pmll?) — ||2e — (un + pm)”. Since the first term on the 
right converges to 4p” as n,m-— > o, and since the second term is always 
< —4? (factor out the 2), we see that [lun — tml]? ~ 0as n,m — o. O 


Theorem 2.1. If M is a complete subspace of a pre-Hilbert space V, then 
V = M @ M+, In particular, this is true for any finite-dimensional sub- 
space of a pre-Hilbert space and for any closed subspace of a Hilbert space. 


Proof. This follows at once from the last two lemmas, since now » = limp, 
exists, lla — p|| = p(a, M), and so (a — pz) L AY. O 


If V = M @ M+, then the projection on Af along Af* is called the orthogonal 
projection on M, or simply the projection on M, since among alt the projections on 
M associated with the various complements of M, the orthogonal projection is 
distinguished. Thus, if Jf is a complete subspace of V, and if P is the projection 
on Af, then P(£) is at once the foot of the perpendicular dropped from £ to Af 
(which is where the word “projection” comes from) and also the best approxi- 
mation to £ in Af (Lemma 2.1), 


Lemma 2.3. If {4f;}7 is a finite collection of complete, pairwise orthogonal 
subspaces, and if for a vector a in V, a; is the projection of a on Af; for 
t= 1,...,n, then 327 a; is the projection of a on By Mj. 


Proof. We have to show that a — 2} a is orthogonal to @] M;, and it is 
sufficient to show it orthogonal to each 1M; separately. But if  € M;, then 
(a — Lia: §) = (@ — ay, &), since (a, §) = 0 for? #7, and (a — a;, f) = 0 
because a; is the projection of a on M;. Thus (a — Dj a, 4) = 0. U 


Lemma 2.4. The projection of ~ on the one-dimensional span of a single 
nonzero vector is ((E, 9)/i}n!!?) x. 


Proof. Here » must be of the form zy. But (£ — zn) 1 » if and only if 
O = (& — x9, 9) = (& 9) — 2llall?, or = (§, »)/[all?. 0 


254 SCALAR PRODUCT SPACES Ba 


We call the number (£, q)/||n!|? the 4-Fourter coefficient of §. If q is a unit 
(normalized) vector, then this Fourier coefficient is just (£, 9). It follows from 
Lemma 2.3 that if {¢;}7 is an orthogonal collection of nonzero vectors, and if 
{x;}7J are the corresponding Fourier coefficients of a vector £, then 7? 2,9, is the 
projection of £ on the subspace Af spanned by {¢,} 7. Therefore, § — V7 x0: 1M, 
and (Lemma 2.1) 37¢ x; is the best approximation to fin M, If tisin M, then 
both of these statements say that & = 5° x,g;. (This can of course be verified 
directly, by letting § = 7 aie: be the basis expansion of £ and computing 
{é, %;) — Zi ar(va oy) = a;lf¢;l|?.) 

If an orthogonal set of vectors {y;} is also normalized ([lg.|| = 1), then we 
eall the set. orthonormal. 


Theorem 2.2, If {y,}f is an infinite orthonormal sequence, and if {z,;}7 are 
the corresponding Fourier coefficients of a vector ¢, then 


da? < e* — (Bessel’s inequality), 
1 


and & = }°? a.y; if and only if Sf 2? = |/£* (Parseval’s equation). 


Proof. Setting g, = Dj «ie; and £ = (& —o,) +o,, and remembering that 
£—o, La,, we have 


Well? = We — oll? + > xi. 


Therefore, $4 #? < |||? for all x, proving Bessel’s inequality, and o, — € (that 
is, || — o,|} > 0) if and only if St 2? — ||é||?, proving Parseval’s identity. D 


We cali the formal series ¥> x;y; the Fourier series of & (with respect to the 
orthonormal set {y;}). The Parseval condition says that the Fourier series of £ 
converges to £ if and only if {jf]? = Sf 2?. 

An infinite orthonormal sequence {y;}7 is called a basis for a pre-Hilbert 
space V if every clement in V is the sum of its Fourier series. 


Theorem 2.3. An infinite orthonormal sequence {y;}7 is a basis for a pre- 
Hilbert space V if {and only if) its linear span is dense in VY, 


Proof. Let & be any clement of V, and let {z,} be its sequence of Fourier 
cocfiicients. Since the linear span of {y;} is dense in V, given any ¢, there is a 
finite linear combination }°7 y,9; which approximates £ to within €. But 
XT xiv: is the best approximation to £ in the span of {¢,}7, by Lemmas 2.3 and 


2.1, and so 
! 


m 
\¢ — » xiv: 
1 
That is, § = 7 «;. O 


Corollary. If V is a Hilbert space, then the orthonormal sequence {¢,}? 
is a basis if and only if {g;}+ = {0}. 


<e forany m > xn. 


5.2 ORTHOGONAL PROJECTION 255 


Proof. Let M be the closure of the linear span of {y;}%. Since V = M+ Mt, 
and since M+ = {»,}+, by Lemma 1.1, we see that {y;;+ = {0} if and only if 
V = M, and, by the theorem, this holds if and only if {¢,;} is a basis. 0 


Note that when orthogonal bases only are being used, the coefficient of a 
veetor — at a basis clement 8 is always the Fourier coefficient (¢, 8)/||p|l*. 
Thus the f-coefficient of — depends only on @ and is independent of the choice 
of the rest of the basis. However, we know from Chapter 2 that when an arbi- 
trary basis containing @ is bcing used, then the §-cocfficient of = varies with the 
basis. This partly explains the favored position of orthogonal bases, 

We often obtain an orthonormal sequence by “orthogonatizing” some given 
sequence. 


Lemma 2.5. Ii {a;} is a finite or infinite sequence of independent vectors, 
then there is an orthonormal sequence {y;} such that {a,;}} and {¢,}]. have 
the same linear span for all x. 


Proof. Since normalizing is trivial, we shall only orthogonalize. Suppose, to be 
definite, that the sequence is infinite, and let M, be the linear span of 
{o,...,@n}. Let uw, be the orthogonal projection of a, on M,_), and set 
¢n = On — fn (and ¢; = a). This is our sequence. We have y; € M; C My_1 
if i <n, and ¢, L M,_1, so that the vectors ¢; are mutually orthogonal. 
Also, gn = 0, since a, is not in M,_,. Thus {y,}7 is an independent subset of 
the n-dimensional vector space M,, by the corollary of Lemma 1.2, and so 
{eit4 spans AM. 0 


The actual caleulation of the orthogonalized sequence {y,} can be carried 
out recursively, starting with yg; = a, by noticing that since yz, is the projection 
of «, on the span of ¢1,..., ¢,—1, 1t must be the vector 3-17! cy,, where ¢;.i8 
the Fourier coefficient of a, with respect to ¢;. 

Consider, for example, the sequence {z"} 9 in @((0,1]). We have g, = 
a, = 1. Next, go = a2 — po = + — c+ 1, where 


¢ = (a2, ¢1)/|lei||? = f xe 1/So (1)? = 3. 
Then vg = a3 — {Cove + ¢1¢1), where 


ey = fy 27+ 1/f, )? = 4 


c= foe — D/f @— P= 4 - D/A) = 1. 
Thus the first three terms in the orthogonalization of {2"}7 in @(f0, 1]) are 1, 
x — 4,27 — (x —4) —4= 2% —2x+1. This process is completely elemen- 
tary, but the calculations obviously become burdensome after only a few terms. 
We remember from general bilinear theory that if for 8 in V we define 
83: V — R by 6(&) = (&, 8), then 6; € V* and @: 8 > 6, is a linear mapping 
from V to V*. If (, n) isa scalar product, then #(8) = |i}? > Oit 8 = 0, and 
so 6 is injective. Actually, ? is an zsometry, as we shall ask the reader to show in 


and 


256 SCALAR PRODUCT SPACES 5.2 


an exercise. If V is finite-dimensional, the injectivity of @ implies that @ is an 
isomorphism. But we have 2 much more startling result: 


Theorem 2,4, @ is an isomorphism if and only if V is a Hilbert space. 


Proof. Suppose first that V is a Hilbert space. We have to show that 6 is sur- 
jective, i.e., that every nonzero F in V* is of the form @g. Given such an F, let N 
be its null space, let a be 3 vector orthogonal to N (Theorem 2.1), and consider 
8 = ca, where ¢ is to be determined later. Every vector & in V is uniquely a sum 
& = «8+ 9, where q isin NV. [This only says that V/N is one-dimensional, which 
presumably we know, but we ean check it directly by applying F and seeing that 
F(é — x8) = Oif and only if« = F(£)/F(8).} But now the equations 


F(é) = F(xB + 9) = xF(8) = xcF (a) 
and 


Oe(€) = (#8) = B+ 9, 8) = xilBll? = xe lal? 


show that 0g = F if we take ¢ = F(a)/ljal]?. 

Conversely, if @ is surjective (and assuming that it is an isometry), then it is 
an isomorphism in Hom({V, V*), and since V* is complete by Theorem 7.6, 
Chapter 4, it follows that V is complete by Theorem 7.3 of the same chapter. 
We are finished. 0 


EXERCISES 


2.3 In the proof of Lemma 2.1, if (a — uw, £ + 0, what value of ¢ will contradict 
the inequality 0 < Qua — w, &) + Pi EHF? 
2.2 Prove the “only if” part of Theorem 2.3. 


2.3 Let {A%;} be an orthogonal sequence of complete subspaces of a pre-Hilbert 
space V, and let P; be the (orthogonal) projection on M;. Prove that {P,£} is Cauchy 
for any Ein ¥. 


2.4 Show that the functions {sin nt} ¢=, form an orthogonal collection of elements in 
the pre-Hilbert space @({0, 7]} with respect to the standard scalar product (f, g) = 
SS #(O g{t) dt. Show also that |{sin née = V2/2. 

2.5 Compute the Fourier coefficients of the function f() = ¢ in @([0,2]) with 
respect to the above orthogonal set. What then is the best two-norm approximation 
to tin the two-dimensional space spanned by sin ¢ and sin 2t? Sketch the graph of this 
approximating function, indicating its salient features in the usual manner of calculus 
curve sketching. 

2.6 The “step” function f defined by f(t) = 1/2 on (0, 7/2] and f() = 0 on (1/2, 7] 
is of course discontinuous at r/2. Nevertheless, calculate its Fourier coefficients with 
respect to {sin nf};=, in @([0,]} and graph its best approximation in the span of 
{sin nt}? 

2.7 Show that the functions {sin né} 7-1 U {cos né} Faq form an orthogonal collection 
of elements in the pre-Hilbert space @{[--z, z]) with respect to the standard scalar 
product (J, 9) = J7. FO of) dt. 


5.3 SELF-ADJOINT TRANSFORMATIONS 257 


2.8 Calculate the first three terms in the orthogonalization of {x*}9 in @({—1, 1}. 

2.9 Use the definition of the norm of a bounded linear transformation and the 
Schwarz inequality to show that |@gl} < ||Si] (where @g($) = (£,8)). In order to 
conclude that 8 +> 6,4 is an isometry, we also need the opposite inequality, ||@|| > ||8}). 
Prove this by using a special value of §. 
2.10 Show that if V is an mcomplete pre-Hilbert space, then V has a proper closed 
subspace Af such that Jf+ = {0}. (Hint: There must exist F € V* not of the form 
P() = (&,@).) Together with Theorem 2.1, this shows that a pre-Ililbert space V isa 
Hilbert space if and only if V = Af © M+ for every closed subspace M. 
2.11 The isometry 6: a> @, [where 6,(#) = (£, @)] imbeds the pre-Hilbert space V 
in its conjugate space V*. We know that V* is complete. Why? The closure of V as 
a subspace of V* is therefore completc, and we can hence complete V as a Banach 
space, Let H be its completion. It isa Banach space ineluding {the isometric image of) 
V asa dense subspace. Show that the sealar product on V extends uniquely to # and 
that the norm on H js the extended scalar product norm, so that H is a Hilbert space. 
2.12 Show that under the isometric imbedding «+> 6, of a pre-Hilbert space V into 
V* orthogonality is equivalent to annihilation as discussed in Section 2.3. Discuss the 
connection between the properties of the annihilator 4° and Lemma 1.1 of this chapter. 
2.13 Prove that if C isa nonempty complete convex subset of a pre-Hilbert space V, 
and if wis any vector not in C, then there is a unique » € C' closest toa. (Examine the 
proof of Lemma 2.2.) 


3. SELF-ADJOINT TRANSFORMATIONS 


Definition. If V is a pre-Hilbert space, then T in Hom V is self-adjoint if 
(Ta, 8) = (a, TB) for every a, 8 = V. The set of all self-adjoint transforma- 
tions will be designated SA. 


Self-adjointness suggests that 7’ ought. to become its own adjoint under the 
injection @ of V into V*. We check this now. Since (a, 8) = @s(a@), we can rewrite 
the equation (Ta, 8) = (a, T8) as @3(Ta) = Frg{a), and again as (T*(@g)) (a) = 
Agger by the definition of T*. This holds for all a and £ if and only if T*(@g) = 
§rg for all 8 € V, or T* o # = 80 T, which is the asserted identification. 


Lemma 3.1. If V is a finite-dimensional Hilbert space and {¢;}7 is an 
orthonormal basis for V, then 7 € Hom(¥V) is self-adjomt if and only if the 
matrix {t;;} of J with respect to {y;} is symmetric (t = t*). 


Proof. If we substitute the basis expansions of a and 8 and expand, we sec that 
(a, 78) = (Te, 8) for all a and 8 if and only if (¢:, T¢;) = (T¢:, ¢;) for all z 
and 7. But Ty, = Sf, taigx, and when this is substituted in these last scalar 
products, the cquation becomes t,; = t;;. That is, T is self-adjoint if and only if 
b=t*£ 


A self-adjoint T is said to be nonnegative if (T£, £) > O for all & Then 
[E, y] = (TE, 9) is a semiscalar product! 


bp(T4) 


258 SCALAR PRODUCT SPACES 5.35 


Temma 3.2, Jf T is a nonnegative self-adjoint transformation, then 
I7()|| < |7yecre, 2% for all £ Therefore, if (Tt, &} = 0, then 
TE = O, and, more generally, if (7£,, £,) +0, then 7(£,) > 0. 


Proof. If T is nonnegative as well as self-adjoint, then [£, 9] = (T&, ») is a 
semisealar product, and so, by Schwarz’s inequality, 


(TE, o)| = [al < (&, £7 fn, a)? = (TE, DAT, aU? 


Taking y = Té, the factor on the right becomes (7 (Tt), T#)*!*, which is less 
than or equal to |{7}]1/*i| 7 Ell, by Schwarz and the definition of ||7||. Dividing by 
|Téll, we get. the inequality of the lemma. 0 


If a = 0 and T(a) = ea for some c, then @ is called an eigenvector (proper 
vector, characteristic vector) of 7, and ¢ is the associated eigenvalue (proper 
value, characteristic value). 


Theorem 3.1. If V is a finite-dimensional Hilbert space and T is a self- 
adjoint element of Hom VY, then V has an orthonormal basis consisting 
entirely of eigenvectors of 7. 


Proof. Consider the function (T¢, £). It is a continuous real-valued function 
of £, and on the unit sphere S = {£: jl£|| = 1} it is bounded above by jT'l 
(by Schwarz). Set m= lub {(7&, £): ||él] = 1}. Since S is compact (being 
bounded and closed), (7'&, £) assumes the value m at some point a on S. Now 
m — Tisa nonnegative self-adjoint transformation (Check this!), and (Ta, a) = 
m is equivalent to ((m — T)a,a) = 0. Therefore, (m — T)a = 0 by Lemma 
3.2, and 7a = ma. We have thus found one eigenvector for 7. Now set ?; = Y, 
@, =a, and m, = m, and let V2 be fa;}+. Then 7[V2] c Vo, for if & L ea, 
then (7¢, a1) = (£ Tai) = m(é, ai) = 0. 

We can therefore repeat the above argument for the restriction of T to the 
Hilbert space ¥» and find a2 in V2 such that |las|| = 1 and T(ac) = moras, 
where mo = lub {(T&, ): ||£] = land eG Vo}. Clearly, mo < my). We then 
set Vg = fey, a2}+ and continue, arriving finally at an orthonormal basis 
{a;} 4 of eigenvectors of 7. 0 


Now let Ay, ..., Ay be the déstznct values in the list 1, ..., mn, and let Ay; 
be the linear span of those basis vectors a; for which m; = A;. Then the sub- 
spaces AY; are orthogonal to each other, V = C4 A/;, each Af; is T-invariant, 
and the restriction of T to M; is \; times the identity. Since adi the nonzero 
vectors In M; are eigenvectors with eigenvalue A;, if the «,’s spanning 47; are 
replaced by any other orthonormal basis for M;, then we still have an ortho- 
normal basis of eigenvectors. The e@;’s are therefore not in general uniquely 
determined. Bui the subspaces M; and the eigenvalues d; are unique. This will 
follow if we show that every eigenvector is in an AZ;. 


Lemma 3.3. In the context of the above discussion, if § = OQand T(£) = xt 
for some z in R, then & € Af; (and so z = A,) for some 7. 


5.3 SELF-ADJOINT TRANSFORMATIONS 259 


Proof. Since V = i M;, we have & == D4 & with & © M;. Then 


Tr 


Dd rh = ct = TD) = x T(é) = py Aig and x (a — dE = 0. 
1 


Since the subspaces M,; are independent, every component (x — A,)é; is 0. But 
some ¢; * 0, since ¢ # 0. Therefore, « = A;, §; = O ford ¥ j, and 


t= %,EM;. 0 
We have thus proved the following theorem. 


Theorem 3.2, If Y is a finite-dimensional Hilbert space and T is a self- 
adjoint element of Hom V, then there are uniquely determined subspaces 
{V3%4 of V, and distinct sealars {A;}4, sueh that {V,} is an orthogonal 
family whose sum is V and the restriction of T to V, is A; times the identity. 


If V is a finite-dimensional vector space and we are given T € Hom V, then 
we know how to compute related mappings such as T? and 7! (if it exists) 
and vectors Ta, Ta, ete., by choosing a basis for V and then computing matrix 
products, inverses (when they exist), and so on. Some of these computations, 
particularly those related to inverses, can be quite arduous. One enormous 
advantage of a basis consisting of eigenvectors for T is that it trivializes all of 
these calculations. 

To see this, let {8,,} bea basis of V consisting entirely of eigenvectors for T, 
and let {r,} be the corresponding eigenvalues. To compute 7T't, we write down 
the basis expansion for & = D7 2.8:, and then TE = Li rizi8;. T? has the 
same eigenvectors, but with eigenvalues {r?}. Thus T?a = 37 r?x,8;. TO! 
exists if and only if no r; = 0, in which case it has the same eigenvectors with 
eigenvalues {1/r;}. Thus 77! = 3% (@:/r)8:. If P(t) = DF ant” is any 
polynomial, then P(T) takes 8; into P(r)8;. Thus P(T)§ = > PG,)a8:. 
By now the point should be amply clear. 

The additional value of orthonormality in a basis is already clear fom the 
last section. Basically, it enables us to compute the coefficients {z,;} of £ by 
scalar products: x; = (£, 8;). 

This is a good place to say a few words about the general eigenvalue problem 
in finite-dimensional theory. Our complete analysis above was made possible by 
the self-adjointness of 7 (or the symmetry of the matrix t). What we can say 
about an arbitrary T in Hom V is much less satisfactory. 

We first note that the eigenvalues of T can be determined algebraically, for A 
is an eigenvalue if and only if 7 — AZ is not injective, or, equivalently, is singu- 
lar, and we know that 7 — XJ issingular if and only ifits determinant A(T — dJ) 
is Q. If we choose any basis for V, the determinant of 7 — AZ is the determinant 
of its matrix t — Ae, and our later formula in Chapter 7 shows that this is a 
polynomial of degree z in A. It is easy to see that this polynomial is independent 
of the basis; it is called the characteristic polynomial of T. Thus the eigenvalues 
of T are exactly the roots of the characteristic polynomial of 7’. 


260 SCALAR PRODUCT SPACES 5.3 


However, T need not have any cigenvectors! Consider, for exampic, a 90° 
rotation in the Cartesian plane. This is the map T: <2, y> +> <—y,2x>. 
Thus 7(8!) = 8? and T(8*) = —6!, so the matrix of T is 


“a 
ft oa) 


and the characteristic polynomial of T is the determinant of this matrix: A? + 1. 
Since this polynomial is irreducible over R, there are no eigenvalues. 

Note how different the outcome is if we consider the transformation with the 
same matrix on complex 2-space C*. Here the scalar field is the complex number 
system, and 7 is the map <2z), 22> te <—2z9,2;> from C? to C®. But now 
M+1= (A+ D(A — 0), and T has eigenvalues +7! To find the eigenvectors 
for 7, we solve T(z) = iz, which is the equation < —22,z1> = <72), t2g>, or 
22 == —tz,; Thus <1,—7> (or ?<1l,—i> = <7,1>) is the unique eigen- 
vector for ¢ to within a sealar multiple. 

We return to our real theory. If T <@ Hom V and x = d(V), so that 
d(Hum V) = x”, then the set of 2? + 1 vectors {7"*27 in Hom V is dependent. 
But this is exactly the same as saying that p(T) = 0 for some polynomial p of 
degree <n”, That is, any T in Hom ¥ satisfies a polynomial identity p(T) = 0. 
Now suppose that 7 is an eigenvalue of T and that T(£) = r& Then p(T)(4) = 
p(ryé = 0, and so p(r) = 0. That is, every eigenvalue of T is a root of the 
polynomial p. Conversely, if p(r) = 0, then we know from the remainder 
theorem of algebra that ¢ — r is @ factor of the polynomial p(f}), and therefore 
(t — ry” will be one of the relatively prime factors of p. Now suppose that p is 
the minimal polynomial of FT (sce Exercise 3.5). Theorem 5.5, Chapter 1, tells 
us that (7 — +f)" is zero on a corresponding subspace NV of V and therefore, in 
particular, that 7 — rf is not injective when restricted to NY. That is, 7 is an 
eigenvalue. We have preved: 


Then the matrix of 7" — > is 


Theorem 3.3. The cigenvalues of T are zeros (roots) of every polynomial 
p(t) such that p(T) — 0, and are exactly the roots of the minimal polynomial. 


EXERCISES 


3.1 Use the defining identity (7, 7) = (&, Tm) to show that the set $8.1 of all self- 
adjoint clements of Hom V is a subspace. Prove similarly that if S and 7 are self- 
adjoint, then ST is self-adjoint if and only if SP = TS. Conelude that if 7 is 
self-adjoint, then so is p(T) for any polynomial p. 

3.2 Show that if 7 is self-adjoint, then S = T? is nonnegative. Show, therefore, 
that if T is self-adjoint and a > 0, then T? 4+- af cannot be the zero transformation. 

3.3 Let p(t) = t-| bt |. ¢ be an irreducible quadratic polynomial (£2 < 4c), and 
let T be a self-adjoint transformation. Show that p(7} = 0. Complete the square 
and apply earher exercises.) 


5.3 SELF-ADJOINT TRANSFORMATIONS 261 


3.4 Let P be self-adjoint and nilpotent (T* = 0 for some x). Prove that T = 0, 
This can be done in various ways. One method is to show it first for n = 2 and then 
for n = 2" by induction. Finally, any » can be bracketed by powers of 2, 2" < 
n< gut ‘ 


3.5 Let V be any vector space, and let T be an clement of Hom V. Suppose that there 
is a polynomial g such that q{7) = 0, and let » be such a polynomial of minimum 
degree. Show that p is unique (to within a constant multiple). It is called the minimal 
polynomial of T. Show that if we apply Theorem 5.5 of Chapter ! to the minimal 
polynomial » of T, then the subspaces NV; must both be nontrivial. 


3.6 It is a corollary of the fundamenta! theorem of algebra that a polynomial with 
real coefficients can be factored into a product of linear factors (f — r) and irreducible 
quadratic factors’ (¢? + bf +c¢). Let P be a self-adjoint transformation on a finite- 
dimensional Hilbert space, and let p(é) be its minimal polynomial. Deduce a new 
proof of Theorem 3.1 by applying to p(t) the above remark, Theorem 5.5 of Chapter 1, 
and Exercises 3.1 through 3.4. 


3.7 Prove that if fT is a self-adjoint transformation on a pre-Hilbert space Y, then 
iis null space is the orlhogonal complement of its range: N(T) = (R(P))+. Conclude 
that if ¥ is a Hilbert space, then a self-adjoint 7 is injective if and only if its range is 
dense (in V). 

3.8 Assuming the above exercise, show that if V is a Hilbert space and T is a self- 
adjoint element of Hom V that is bounded below (as well as bounded), then T is sur- 
jective. 

3.9 Let T be self-adjoint and nonnegative, and set m = lub {(Té, &: |[é{} = 1}. 
Use the Schwarz inequality and the inequality of Lemma 3.2 to show thai m = ||7'|. 


3.10 Let V be a Hilbert space, let f be a self-adjoint element of Hom VY, and set 
m = lub {(7§, &): |/E{| = Db. Show that if a > m, then a — T (=al — T) is m- 
vertible and |[(a — T)—}|| < 1/(@ — m). (Apply the Schwarz inequality, the definition 
of m, and Exercise 3.8.) 

3.11 Let P be a bounded linear transformation on a pre-Hilbert space V that is a 
projection in the sense of Chapter 1. Prove that if P is self-adjoint, then P is an 
orthogonal projection. Now prove the converse. 

3.12 Let V be a finite-dimensional Hilbert space, let 7 in Hom YV be self-adjoint, 
and suppose that S in Hom V commutes with 7. Show that the subspaces A; of 
Theorem 3.1 and Lemma 3.3 are invariant under S. 

3.13 A self-adjoint transformation 7 on 4 finite-dimensional Hilbert space V is said 
to have a stmple spectrum if al] its eigenvalues are distinct. By this we mean that all 
the subspaces Af; are one-dimensional. Suppose that T is a self-adjoint transformation 
with a simple spectrum, and suppose that S$ commutes with 7, Show that 8 is alsa 
self-adjoint. (Apply the above exercise.) 

3.14 Let H be a Hilbert space, and let w({£, q] be a bounded bilinear form on H * H. 
That is, there is a constant 6 such that 


lolf, all S SEH nll = forall ye H. 


Show that there ig a unique 7 in Hom FV such that off, x] = (¢, T4}. Show that T is 
self-adjoint if and only if w is symmetric. 


262 SCALAR PRODUCT SPACES 5.4 


4, ORTHOGONAL TRANSFORMATIONS 


Assuming that V is a Hilbert space and that therefore ¢: V — V* is an isomor- 
phism, we can of course replace the adjoint T* € Hom V* of any T € Hom V 
by the corresponding transformation @~' > T* o @G Hom V. In Hilbert space 
theory it is fas mapping that is ealled the adjoint of T and is designated T*. 
Then, exactly as in our discussion of a self-adjoint T, we see that 


(Ta, 8) = (a, T*A) forall a, 8EV 


and that T* is uniquely defined by this identity. Finally, 7 is self-adjoint if and 
only if T = T*. 

Although it really amounts to the above way of introducing 7'* into Hom Y, 
we can make a direct definition as follows. For each y the mapping +> (TE, 9) 
is linear and bounded, and so is an element,of V*, which, by Theorem 2.4, is 
given by a unique element 8, in V according to the formula (T¢&, 7) = (&, &,). 
Now we check that 7 +> 8, is linear and bounded and is therefore an element of 
Hom V which we call T*, ete. 

The matrix calculations of Lemma 3.1 generalize verbatim to show that the 
matrix of 7 in Hom V is the transpose t* of the matrix t of 7’. 

Another very important type of transformation on a Hilbert space is one 
that preserves the scalar product. 


Definition. A transformation T © Hom V is arthagonal if (Ta, TB) = 
(«, 8) for alla, BEV. 


By the basic adjoint identity above this is entirely equivalent to (e, T*T8) = 
(qe, 8), for all a, 8, and hence to T*T = I. An orthogonal T is injective, since 
[|Zal}? = {\al/*, and is therefore invertible if V is finite-dimensional. Whether V 
is finite-dimensional or not, if 7’ is invertible, then the above condition becomes 
T* = TO, 

tf T € Hom R”, the matrix form of the equation 7*T = I is of course 
t*t = e, and if this is written out, it becomes 


oD teaty = 8 for all t, d; 
kee] 


which simply says that the columns of ¢ form an orthonormal set (and hence a 
basis) in R”. We thus have: 


Theorem 4.1. A transformation T € Hom R" is orthogonal if and only if 
the image of the standard basis {6}? under 7 is another orthonormal basis 
(with respect to the standard scalar product). 


The necessity of this condition is, of course, obvious from the scalar-product- 
preserving definition of orthogonality, and the sufficiency can also be checked 
directly using the basis expansions of a and £. 

We can now state the eigenbasis theorem in different terms. By a diagonal 
matrix we mean a matrix which is zero everywhere except on the main diagonal. 


5.4 ORTHOGONAL TRANSFORMATIONS 263 


Theorem 4.2. Let t = {t;;} be a symmetric » x » matrix. Then there 
exists an orthogonal n X matrix b such that b—'tb is a diagonal matrix. 


Proof. Since the transformation 7 € Hom R” defined by t is self-adjoint, there 
exists an orthonormal basis {b‘}7 of eigenvectors of 7, with corresponding 
eigenvalues {r;}4. Let B be the orthogonal transformation defined by B(8’) = 
b’, j= 1,...,n. (The x-tuples b’ are the columns of the matrix b = {b,;} 
of B.) Then (B710 fo B)(s*) = r;6’. Since (B~' o T o B)(6*) is the jth 
column of b—'th, we see that s = b~'tb is diagonal, with 3;; = r;. 0 


For later applications we are also going to want the following result. 


Theorem 4.3. Any invertible 7 € Hom V on a finite-dimensional Hilbert 
space V can be expressed in the form T = #&S, where 2 is orthogonal and S$ 
is self-adjoint and positive. 


Proof. For any T, T*T is self-adjoint, since (7*7)* = 7*T** = T*T. Let 
{y;}7 be an orthonormal eigenbasis, and let {r;}{ be the corresponding eigen- 
values of T*7. Then 0 < |Te,dl? = "Te, ¢3 = (ies 9d) = 1: for each 7. 
Since all the eigenvalues of T'*T are thus positive, we can define a positive square 
root S = (T*T)? by Se, = (r) "7; 7 = 1,2,..., 2. It is clear that S? = 
T*T and that S is self-adjoint. 

Then A = ST! is orthogonal, for (S7?—1a, ST—18) = (Ta, 8°77 18) = 
(Ta, T*TT—'8) = (Ta, T*B) =~ (TT—a, 8B) = (a, 8). Since T= A'S, 
we set R = A and have the theorem, 6 


It is not hard to see that the above factorization of Tis unique. Also, by 
starting with 7'7*, we can express Tin the form 7 = SH, where S$ is self-adjoint 
and positive and # is orthogonal. 

We call these factorizations the polar decompositions of 7, since they func- 
tion somewhat like the polar coordinate factorization z = re"? of a complex 
number. 


Corollary. Any nonsingular ~ X nm matrix t can be expressed as t = udy, 
where u and y are orthogonal and d is diagonal. 


Proof. From the theorem we have t = rs, where r is orthogonal and s is sym- 
metric. By Theorem 4.2, s = bdb71, where d is diagonal and b is orthogonal. 
Thus t= rs = (rb)db—! = udy, where u= rb and y= b™! are both 
orthogonal. 0 


EXERCISES 


4,1 Let V be a Hiibert space, and suppose that S$ and 7 in Hom ¥ satisfy 
(TE, n) = (E, Sy) forall &, 7. 
Write out the proof of the identity S = 6-10 T* o @, 


264 SCALAR PRODUCT SPACES 5.5 


4.2 Write out the analogue of the proof of Lemma 3.1 which shows that the matrix 
of T* is the transpose of the matrix of T. 

4.3 Once again show that if (£,) = (£, &) for ali & then » = ¢. Conclude that if 
S, T in Hom V are such that (£, Tx) = (£, Sq) for all y, then 7 = S. 

4.4 Let {a, b} be an orthonormal basis for R2, and let t be the 2 2 matrix whose 
columns are a and b. Show by direct calculation that the rows of t are also ortho- 
normal. 

4.5 State again why it is that if V is finite-dimensional, and if S and T in Hom V 
satisfy So T = f, then T isinvertible and 8S = 77}. Now let V bea finite-dimensianal 
Hilbert space, and let 7 be an orthogonal transformation in Hom V. Show that 7* is 
also orthogonal. 

4.6 Let t be an m X n matrix whose columns form an orthonorma) basis for R”. 
Prove that the rows of t also form an orthonormal basis. (Apply the above exercise.) 

4,7 Show that a nonnegative self-adjoint transformation $ on a fnite-dimensional 
Hilbert space has a uniquely determined nonnegative self-adjoint square root, 

4.8 Prove that if V is a finite-dimensional Hilbert space and F € Hom Y, then the 


“polar decomposition” of T, 2 = RS, of Theorem 4.3 is unique. {Apply the above 
exercise.) 


5. COMPACT TRANSFORMATIONS 


Theorem 3.1 breaks down when V is an infinite-dimensional Hilbert space. 
A self-adjoint transformation fT does not in general have enough eigenvectors 
to form a basis for V, and a more sophisticated analysis, allowing for a “con- 
tinuous spectrum” as well as a “discrete spectrum”, is necessary. ‘This en- 
riched situation is the reason for the need for further study of Hilbert space 
theory at the graduate level, and is one of the sources of complexity in the 
mathematical structure of quantum mechanics, 

However, there is one very important special case in which the eigenbasis 
theorem is available, and which will have a startling application in the next 
chapter. 


Definition. Let V and W be any normed linear spaces, and let S be the unit 
ballin V. A transformation 7’ in Hom(V, W) is compact if the closure of 
T[S] in W is sequentially compact. 


Theorem 5.1. Let V be any pre-Hilbert space, and let 7 © Hom V be self- 
adjoint and compact. Then the pre-Hilbert space # = range (7') has an 
orthonormal basis {y,;} consisting entirely of eigenvectors of 7, and the 
corresponding sequence of eigenvalues {r,} converges to 0 (or is finite). 


Proof. The proof is just like that of Theorem 3.1 except that we have to start a 
little differently. Set m= ||T{( = lub {[|7(|| : ||él| = 1}, and choose a se- 
quence {£,} such that ||, = 1 for all ~ and ||7'(£,)|| ~ m. Then 


((m? see al én) = m? — P(én)Il? — 0, 


5.5 COMPACT TRANSFORMATIONS 265 


and since m? — T? is a nonnegative self-adjoint transformation, Lemma 3.2 
telis that (m? — T?)(£,) — 0. But since 7' is compact, we can suppose (passing 
to a subsequence if necessary) that {7 £,} converges, say to 8. Then T?é, — 78, 
and so mi, — TB also. Thus ¢, ~ 78/m? and 6 = lim Té, = T?{8)/m? 
Since [||| = lim ||7'(¢,)|| = m, we have a nonzero vector 8 such that T?(8) = 
mB, Seta = 6/\|8\. 

We have thus found a vector a such that |fall = land0 = (m? — T*)(e) = 
(m — T)(m-+ T)(a). Then either (m+ 7)(a) = 0, in which case T(a) = 
—max, or (m+ T)(a) = ¥ * 0 and (m — T)y = 0, in which case TY = mv. 
Thus there exists a vector yg, (@ or ¥/||¥ |) such that ‘|g,|| = 1 and T(¢,) = 
191, where |r;] = m. We now proceed just as in Theorem 3.1. 

For notational consistency we set m, = m, V,; = V, and now set V> = 
{o}t. Then T[Vo]C Ve, since if at, then (Ta, 1) = (a, T¢2) = 
r\(a, ¢1) = 0. Thus T | V2 is compact and self-adjoint, and if mz = ||7 f Voll, 
there exists ge with ||vel| = I and T(¢2} = rege, where [rg] = me. We continue 
inductively, obtaining an orthonormal sequence {y,} C V and a sequence 
{r,} C R such that Ty, = rag, and |r,| = ||T f V,'|, where 


Vs = {1 toe ee Yo 


We suppose for the moment, since this is the most interesting case, that 
r, ¥ 0 for all n- Then we claim thai |r,,.| ~ 0. For |r,| is decreasing in any case, 
and if it does not converge to 0, then there exists a 6 > O such that |r,| > 6 
for all x. Then |i7(v) — Tvl? = Ilres — rieall? = Wied? + llrsell? = 
r? +1} > 2b? for all 4 + j, and the sequence {7'(y,)} can have no convergent 
subsequence, contradicting the compactness of T. Therefore |r,| | 0. 

Finally, we have to show that the orthonormal sequence {¢;} is a basis for 2. 
If 8 = T(a), and if {5,} and {ep} are the Fourier coefficients of @ and a, 
then we expect that 6, = 7,@,, and this is easy to check: 6, = (8, ¢n) = 
(Ta), en) = (a, Then) = (a, rnOn) = Tal@, On) = TaGn. This is just saying 
that T(dnvn) = bagn, and therefore 8 — i by; = T(a — Xi ay,). Now 
a — >-j a.g; is orthogonal to {y;}{ and therefore is an element of V,,4:, and the 


norm of T on Va4, is [ra4il. Moreover, la — 7 av;l| < lle], by the Pytha- 
gorean theorem. Altogether we can conclude that 

! n : 

8 — 2 bees] < asad lal 


and since r,41 — 0, this implies that 8 = Sf bie;. Thus {¢;} is a basis for 
R(T). Also, since T is self-adjoint, N(7T) = R(T)* = {e+ = Nez ¥:. 

If some yr; = 0, then there is a first » such that r, = 0. In this 
case ||7’ [| V,,/| = |r,| = 0, so that V, C N(T). But ¢; € R(T) if i <n, since 
then ¢; = T(v;)/r:, and so N(T) = R(T)* C {¢1,.--, @n—1}*+ = Va. There- 
fore, N(T) = V, and R(T) is the span of {¢,}477. O 


CHAPTER 6 


DIFFERENTIAL EQUATIONS 


This chapter is not a small differential equations textbook; we leave out far too 
much. We are principally concerned with some of the theory of the subject, 
although we shall say one or two practical things. Our first goal is the funda- 
mental existence and uniqueness theorem of ordinary differential equations, 
which we prove as an clegant application of the fixed-point theorem. Next we 
look at the linear theory, where we make vital use of material from the first two 
chapters and get quite specific about the process of actually finding solutions. 
So far our development is linked to the initial-value problem, concerning the 
existence of, and in some cascs the ways of finding, a unique solution passing 
through some initially prescribed point in the space containing the solution 
curves. However, some of the most important aspects of the subject relate to 
what are called boundary-value problems, and our last and most sophisticated 
effort will be directed toward making a first step into this large area. This will 
involve us in the theory of Chapter 5, for we shall find ourselves studying self- 
adjoint operators. In fact, the basic theorem about Fourier series expansions 
will come out of recognizing a certain right inverse of a differential operator to be 
@ compact self-adjoint operator. 


1. THE FUNDAMENTAL THEOREM 


Let A be an open subset of a Banach space W, let J be an open interval in R, and 
let F: J X A — W be continuous. We want to study the differential equation 


da/dt = Flt, a). 


A solution of this equation is a function f: / — A, where J is an open subinterval 
of I, such that f'(f) exists and 

IO = FYI) 
for every {in J. Note that a solution f has to be continuously differentiable, for 
the existence of f’ implies the continuity of f, and then f’(t) = F(Z, f(t) is 
continuous by the continuity of F. 


We are going to see that if F has 4 continuous second partial differential, 
then there exists a uniquely determined “local” solution through any point 


<th,ao> GIX A, 
266 


6.1 THRE FUNDAMENTAL THEOREM 267 


In saying that the solution f goes through <p, ag>, we mean, of course, that 
ay = f(tp). The requirement that the solution f have the value ay when ? = fp 
is called an znitial condition. : 

The existence and continuity of dF2,,. implies, via the mean-value 
theorem, that F(t, a) is locally uniformly Lipschitz in «. By this we mean that 
forany point <fo, a9> in I X A there is a neighborhood Af x N and a constant 
6 such that IlF(t, =) — F(t, »)|| < bilé — o]| for all ¢ in AZ and all ¢, » in N. 
To see this we simply choose balls 4 and N about tg and ao such that dF2, 4 is 
bounded, say by b, on M X N, and apply Theorem 7.4 of Chapter 3. This is the 
condition that we actually use below. 


Theorem 1.1. Let A be an open subset of a Banach space W’, let J be an 
open interval in R, and let F be a continuous mapping from J x A to W 
which is locally uniformly Lipschitz in its second variable. Then for any 
point <to,a >> in J X A, for some neighborhood UY of ag and for any 
sufficiently small interval J containing éo, there is a unique function f from 
J to U which is a solution of the differential equation passing through the 
point <ig, ag>. 


Proof. Uf is a solution on J through <f, a9>, then an integration gives 


AO — Hlbo) = f/ Fs, 408)) ds, 
so that 
KD = ao + f° F(s, 4(8)) as 


forte J. Conversely, if f satisfies this “integral equation”, then the funda- 
mental theorem of the caleulus implies that f’(/) exists and equals F(t, f(2)) 
on J, so that fis a solution of the differential equation which clearly goes through 
<to, ag>. Now for any continuous f: J — A we can define g: J — W by 


a(t) = ao + f P(sS9)) ds, 


and our argument above shows that f is a solution of the differential equation if 
and only if f is a fixed point of the mapping K: fr g. This suggests that we try 
to show that K is 2 contraction, so that we can apply the fixed-point theorem. 

We start by choosing a neighborhood L x U of <to, a> on which F(i, a) 
is bounded and Lipschitz in w uniformly over ¢. Let J be some open subinterval 
of Z containing fo, and let V be the Banach space ®C(J, W) of bounded con- 
tinuous functions from J to W. Our later calculation will show how small we 
have to take J. We assume that the neighborhood U is a ball about ag of radius 
r, and we consider the ball of functions U = B,(&) in V, where & is the con- 
stant function with value ao. Then any fin U has its range in U, so that F(t, f(@®) 
is defined, bounded, and continuous, That is, K as defined earlier maps the ball 
“into V. 


268 DIFFERENTIAL EQUATIONS 6.1 


We now ealculate. Let # be bounded by m on L X U and let é be the length 
of J. Then 


IK (&o)} — @olle = lub {| f F(s, ao) as|| :teJl < bm (1) 


by the norm inequality for integrals (see Section 10 of Chapter 4). Also, if J, 
and f are in ‘U, and if ¢ is a Lipschitz constant for F on E x U, then 


(St) — KCa)lle = Ind {| f” POA) — FCs, fo)|]} 


élub {IF (s, fi(s)) — Ps, fo(s) I} 

be lub {'lf(s) — fa(s)l!} 

aclifs — Salle. (2) 

From (2) we sce that K ig a contraction with constant C = ée if de < 1, and 

from (1) we see that K moves the eenter 2 of the ball U a distance less than 

(1 — C)r if ém < (i — de}r. This double requirement on 6 is equivalent to 
r 

m+ cr’ 


IATA Il 


I 


6< 


and with any such 6 the theorem follows from a corollary of the fixed-point 
theorem (Corollary 2 of Theorem 9.1, Chapter 4). 0 


Corollary. The theorern holds if #: J x A — W is continuous and has a 
continuous second partial differential. 


We next show that any two solutions through <fo, ag > must agree on the 
intersection of their domains (under the hypotheses of Theorem 1.1). 


Lemma 1.1. Let g; and g2 be any two solutions of da/dt = F(t, a) through 
<to, ag>. Then 9) = ge(é) for all ¢ in the intersection J = J, J of 
their domains. 


Proof. Otherwise there is a point s in J such that g,(s) = go(s). Suppose that 
8 > to, and set C= {f:t > ft and g1() ¥ goft)} and x = glhC. The set C 
is open, since g, and ge are continuous, and therefore < is not in C. That is, 
gi(x) = ge(z). Call this common value ae and apply the theorem to <z,a>. 
With r such that B,(a@) CA, we choose 6 small enough so that the differential 
equation has a unique solution g from (z — 6, 2 -+ 8) to B,(a) passing through 
<x, a>, and we also take 6 small enough so that the restrictions of g) and go ta 
(z — 6,%+ 6) have ranges in B,(a). But then g; = g2 = g on this interval 
by the uniqueness of g, and this contradicts the definition of z. Therefore, 
91 = go on the intersection of their domains. 1 


This lemma allows us to remove the restriction on the range of f in the 
theorem. 


Theorem 1.2. Let A, 7, and F be as in Theorem 1.1. Then for any point 
<to,a9> in J x A and any sufficiently small interval neighborhood J of fo, 
there is a unique solution from J to A passing through <tpo, ao>. 


6.1 THE FUNDAMENTAL THEOREM 269 


ir 
1 
| 
‘f 
on 
} 
t 
! 
t 
} 
t 
I 
I 
i 
| 
| 
} 


tn a tee hte ore ee 


Fig. 6.1 


Global solutions. The solutions we have found for the differential equation 
da/dt = F(t, a) are defined only in sufficiently small neighborhoods of the initial 
point % and are accordingly called local solutions. Now if we run along to a 
point <é,, a, > near the end of such a local solution and then consider the local 
solution about <1, a, >, first of all it will have to agree with our first solution 
on the intersection of the two domains, and secondly it will in general extend 
farther beyond ¢, than the first solution, so the twa local solutions will fit together 
to make a solution on a larger ¢-interval than either gives separately. We can 
continue in this way to extend our original solution to what might be called a 
global solution, made up of a patchwork of matching local solutions. These 
notions are somewhat vague as described above, and we now turn to a more 
precise construction of a global solution. 

Given <fp,a9> CI xX A, let & be the family of all solutions through 
<t, ap>. Thus g & § if and only if g is a solution on an interval J C I, t9 EJ, 
and g(fo) = ao. Lemma 1.1 shows exactly that the uniont f of all the functions g 
in & is itself a function, for if <t),a,;> Eg, and <tj,a9> Ege, then a, = 
git) = golf) = ag. 

Moreover, f is a solution, because around any z in its domain f agrees with 
some g € 5. By the way f was defined we sec that f is the unique maximum 
solution through <tg, aqg>. We have thus proved the following theorem. 


Theorem 1.3. Let F: J x A — V bea function satisfying the hypotheses of 
Theorem 1.1. Then through each <f9,a9> in J x A there is a uniquely 
determined maximal solution to the differential equation da/dt = F(t, a). 


In general, we would have to expect a maximal solution to “run into the 
boundary of A” and therefore to have a domain interval J properly included in J, 
as Fig. 6.1 suggests. 


j Remember that we are taking a function to be a set of ordered pairs, so that the 
union of a family of functions makes precise sense. 


270 DIFFERENTIAL EQUATIONS 6.1 


However, if A is the whole space W, and if F(£, a) is Lipsohitz in a for each t, 
with a Lipschitz bound e(t) that is continuous in é, then we can show that each 
maximal solution is over the whole of 7. We shall shortly sec that this condition 
is a natural one for the linear equation. 


Theorem 1.4. Let W be a Banach space, and let J be an open interval in R. 
Let F: I x W— W be continuous, and suppose that there is a continuous 
function ¢: J — R such that 


IF @, a1) — FE, aa)ll S e(f)lea — oll 


for all tin J and all a;, ag in W. Then each maximal solution to the differ- 
ential equation da/dt = F(t, a) has the whole of 7 for its domain. 


Proof. Suppose, on the contrary, that g is a maximal solution whose domain 
interval J has right-hand endpoint 6 less than that of 7. We choose a finite open 
interval J, containing b and such that 7, Cc TI (see Fig. 6.2). Since % is compact, 
the continuous function ¢(t) has a maximum value c on 7. We choose any ty 
in LJ close enough to 6 so that 6 ~- t; < 1/e, and we set a; = g(t,) and 
m = max ||F tt, a;)|| on Z. With these values of ¢ and m, and with any r, the 
proof of Theorem 1.1 gives us a local solution f through <¢,, a, > with domain 
(tj — 6,4; + 8) for any 68 less than r/{m-+ re) = 1/(e + (m/r}). Since we 
now have no restriction on r (because A = W), this bound on 6 becomes 
1/c, and since we chose ¢; so that ¢; + (1/e) > 6b, we can now choose 6 so that, 
ti; + 5 > 5. But this gives us a contradiction; the maximal solution g through 
<tz,a,> includes the local solution f, so that, in particular, ; + 6 < b. 
We have thus proved the theorem. 0 


J 
ae — 
4b 
eS 
i t 
——-_—_—" 
£L Fig. 6.2 


Going back to our original situation, we can conclude that if the Lipschitz 
control of # is of the stronger type assumed above, and if the domain J of some 
maximal solution g is less than 7, then the open set A cannot be the whole of W. 
It is in fact true that the distance from g(é) to the boundary of A approaches 
zero as ¢ approaches an endpoint 6 of J which is interior to 7. That is, it is now a 
theorem that p(f(t), 4‘) ~» 0 as ¢~> b. The proof is more complicated than our 
argument above, and we leave it as a set of exercises for the interested reader. 


The nth-order equation. Let A;, Ag,..., 4, be open subsets of a Banach 
space W, let J be an open interval in R, and let G: Ix A, XK Ao X--+ xX A, W 
be continuous. We consider the differential equation 


da/dt” = Glt, a, da/dt,...,d"~'a/dt""). 


6.1 THE FUNDAMENTAL THEOREM 271 


A function f: J — W is a solution to this equation if J is an open subinterval of 
I,f has continuous derivatives on J up to the nth order, f°""[J] C A, i= 
1,...,7, and 


SPO = GE OSLO, --- FPO) 
fort éJ. An initial condition is now given by a point 
<to, 81, Ba... 5 Ba elx Ai X:-: x An. 


The basic theorem is almost the same as before. To simplify our notation, let a 
be the z-tuple <a), a2,...,a,> in W" = V, and set A = JJ} A; Also let 
v be the mapping fr <f, f%,...,f" >. Then the solution equation 
becomes f‘(2) = G(t, ¥f@). 


Theorem 1.5. Let G: i X A—» W be as above and suppose, in addition, 
that G(, «) is locally uniformly Lipschitz in @. Then for any <to,8> in 
Ix A and for any sufficiently smali open interval J containing to, there is a 
unique function f from / to W such that f is a solution to the above nth-order 
equation satisfying the initial condition ¥¢f{(to) = 8. 
Proof. There is an ancient and standard device for reducing a single nth-order 
equation to a system of first-order equations. The idea is to replace the single 
equation 
d"a/dt” = Git, «, da/dt,...,d"~'a/dt*"~ 


by the system of equations 


da /dt = 2, 
day/dt = 3; 


den—1/dt = op, 
dan/dt = Gt, a1, a2... On); 
and then to recognize this system as equivalent to @ single first-order equation 
on a different space. In fact, if we define the mapping F = <F',...,F"> 
from I x A to V = W" by setting F*(¢,@) = a:41 fori = 1,...,2 — 1, and 
F"(i,e) = Git, x), then the above system becomes the single equation 


da/dt = F(t, «), 


where F is clearly locaily uniformly Lipschitz m @. Now a function f = 
<fi,...,dn> from J to V is a solution of this equation if and only if 


fi = fe, 
fy = fs, 
aes =Sn 


i = Gt, fi, - ++ fn), 


272 DIFFERENTIAL EQUATIONS 6.1 


that is, if and only if f; has derivatives up to order n, ${f,) = f and f tn) (4) _ 
Git, of i()). The n-tuplet initial condition ¥f(f.) = 8 is now just f(f) = 8. 
Thus the nth-order theorem for G has turned into the first-order theorem for F, 
and so follows from Theorems 1.1 and 1.2 0 


The local solution through <p, 68> extends to a unique maximal solution 
by Theorem 1.3 applied to our first-order problem dea/dt = F{t, a), and the 
domain of the maxima! solution is the whole of J if G(, «) is Lipschitz in @ with 
a bound e(é) that is continuous and if A = W”, as in Theorem 1.4. 


EXERCISES 


1.1 Consider the equation da/dt = F(i,a) in the special case where W = R?. 
Write out the equation as a pair of equations involving real-valued functions and real 
variables. 


1.2 Consider the system of differential equations 
dz/dt = t+ 224+ , dy/dt = cos zy. 
Define the function FP: R? — R? so that the above system becomes 


da/dt = F(t, x), 
wherea = <z,y>. 


1.3 In the above exercise show that F is uniformly Lipschitz in « on R X A, where 
A is any bounded open set in R2, Is ¥ uniformly Lipschitz on R X R?? 


1.4 Write out the above system in terms of a sclution function f = <fi,f2>. 
Write out for this system the integrated form used in proving Theorem 1.1. 


1.5 The fixed-point theorem iteration sequence that we used in proving Theorem 1.1 
starts off with fo as the constant function & and then proceeds by 


f(t} = eat ie F(s, fa—1(s)) ds. 


Compute this sequence as far as fa for the differential equation 
dz/dt=t+x [f'® =t+fO} 
with the initia] condition f(0) = 0, That is, take fo = 0 and compute fi, fo, f3, and fa 
from the formula. Now guess the solution f and verify it. 
1.6 Compute the iterates fo, fi, fe, and {3 for the initial-value problem 


dy/dz=x+y?, yO) = 0. 


Supposing that the solution f has a power series expansion about 0, what are its first 
three nonzero terms? 


1.7 Make the computation in the above exercise for the initial condition f(0) = —1. 
1.8 Do the same for f(0) = +1. 


6.1 THE FUNDAMENTAL THEOREM 273 


1.9 Suppose that W is a Banach space and that F and G@ are functions from R x W4 
to W satisfying suitable Lipschitz conditions. Show how the second-order system 


= Ft, é, ; Le 7’), e” = Gt, , 9; ei 7’) 
would be brought under our standard theory by making it into a single second-order 


equation. 


1,10 Answer the above exercise by converting it to a first-order system and then ta a 
single first-order equation. 


1.11 Let 6 be a nonnegative, continuous, real-valued function defined on an interval 
[0, a] C R, and suppose that there are constants 6 and ¢ > 0 such that 


A(z) <e E 6(t) détbz forall 2x60, a). 


a) Prove by induction that if m = ||@ll.o, then 


sory +oy @ ie 


Ox) < for every mn. 


b) Then prove that 
Oz) < 2 fe —1) for all x. 


1.12 Let W be a Banach space, let Z be an interval i in R, and let F be a continuous 
mapping from 7 X W to W. Suppose that {|F(¢, ao) |} < & for allt © J and that 


JF, a) — Ft, B}|| < ella — Bll 


for all? in J and alla,f in W. Let f be the global solution through <to, ag>, and 
set 6(4} = ||f(fo +- x) — aol. Prove that 


A(x) < fp ” g(t) dt} be 


for > Oand tg-+a2in J. Then use the result in the above exercise to derive a much 
stronger bound than we have in the text on the growth of the solution f(Z) as £ goes 
away from fo. 


1.13 With the hypotheses on F as in the above exercise, show that the iteration 
sequence for the solution through <p, ao> converges on the whole of I by showing 
inductively that if fe = @o und 


£ 
fall) = a+ f F((s), fo160)) as 
then 
b (et 
li — feo] 2 2. 
From these inequalities prove directly that the solution f through <¢tp, ag> satisfies 


I) — aol) <2 Ge"! — 0, 


274 DIFFERENTIAL EQUATIONS 6.2 


2. DIFFERENTIABLE DEPENDENCE ON PARAMETERS 


It is exceedingly important in some applications to know how the solution to the 
system 


fOW= 4650), Ka)=a 


varies with the initial point <é,a,>. In order to state the problem precisely, 
we fix an open interval J, set U = B,(a@o) C V = @C(/, W) as in the previous 
section, and require a solution in ‘U passing through <f), @)>, where <é), a,> 
is near <tg,a9>. Supposing that a unique solution f exists, we then have a 
mapping <¢,, a, > + f, and it is the continuity and differentiability of this map 
that we wish to study. 


Theorem 2.1. Let L x U be a neighborhood of <é9,a@ > in the Banach 
space R x W, and let F(t,«) be a bounded continuous mapping from 
L x U to W which is Lipschitz in « uniformly over ¢. Then there is a neigh- 
borhood J x N of <9, 29> with the following property. Forany <t;, a> 
in J < NW there is a unique function f from J to U which is a solution of the 
differential equation da/di = F(t, a) passing through <é,, a,>, and the 
mapping <&,a,> +f from J X N to V is continuous. 


Proof. We simply reexamine the calculation of Theorem 1.1 and take 6 a little 
smaller. Let K(é,, a1, f) be the mapping of that theorem but with initial point 
<ty,a,>,80 that g = K(t), «1, f) if and only if g(t) = a + fit F(s, f(s)) ds for 
all éin J. Clearly & is continuous in <é,, a@;> for each fixed f. 

If N is the ball B,;2(ag), then the inequality (1) in the proof shows that 
[LK (t;, @1, &o) — Boll < flay — aoll + ém < r/2+ dm. The second inequality 
remains unchanged. Therefore, {> K(é, a1, f) is a map from U to V which isa 
contraction with constant C = dc if é¢ < 1, and which moves the center & of U 
a distance less than (1 — C)r if r/2+ dm < (1 — oc)r. This new double 
requirement on 6 is equivalent to 

‘ 

as 2(m + er) : 
which is just half the old value. With J of length 6, we can now apply Theorem 
9.2 of Chapter 4 to the map K(t;, a1, f) from (J X N) XU to V, and so have 
our theorem. 0 


If we want the map <¢,,a@)> t+ f to be differentiable, it is sufficient, by 
Theorem 9.4 of Chapter 4, to know in addition to the above that 


K:UJU xXN)xXxu-V 
is continuously differentiable. And to deduce this, it is sufficient to suppose that 
dF exists and is uniformly continuous on LX U. 


Theorem 2.2, Let LX U be a neighborhood of <fg,a > in the Banach 
space R x W, and let F(t, 2) be a bounded mapping from L xX U to W such 
that df exists, is bounded, and is uniformly continuous on Z X U, Then, in 


6.2 DIFFERENTIABLE DEPENDENCE ON PARAMETERS 75 


the context of the above theorem, the solution f is a continuously differ- 
entiable funetion of the initial value <¢,, ay>. 


Proof. We have to show that the map A(f,, a,, f) from (J X N) XK Wto V is con- 
tinuously differentiable, after which we ean apply Theorem 9.4 of Chapter 4, as 
we remarked above. Now the mapping 2 + k defined by k() = Si A(s) ds isa 
bounded linear mapping from V to V which clearly depends continuously on ¢;, 
and by Theorem 14.3 of Chapter 3 the integrand map fr A defined by A(s) = 
F(s, f(s)} is continuously differentiable on U. Composing these two maps we see 
that dK%.,.0,,¢> exists and is continuous on J x N x UU. Now 


AK 4,0, >(8) = & 


so that dK? = I, and AKki a, s>(A) = —fo%" F(s, f(s)) ds, from which it 
follows easily that dK 44, ,.7>(h) = AF (ty, f()). The three partial differentials 
dK}, dK?, and dK? thus exist and are continuous on J X N X U, and it follows 
from Theorem 8.3 of Chapter 3 that K(t;, a), f) is continuously differentiable 
there. 0 


Corullary. If sis any point in J, then the value f(s) of a solution at s is a 
differentiable function of its value at fp. 


Proof. Let f. be the solution through <fp,a>. By the theorem, a f, isa 
continuously differentiable map from N to the function space V = ®C(/, W). 
But wa: fr f(s) is a bounded linear mapping and thus trivially continuously 
differentiable. Composing these two maps, we see that a — f,(s) is continuously 
differentiable on N. 0 


It is also possible to make the continuous and differentiable dependence of 
the solution on its initial value <éo, a9 > into a global affair. The following is 
the theorem. We shall not go into its proof here. 


Theorem 2.3, Let f be the maximal solution through <é, @9 > with domain 
J, and let {a, 6] be any finite closed subinterval of J containing fo. Then there 
exists an € > 0 such that for every <i, a)> € Bo <ég, ag>) the domain 
of the global solution through <t,, a; > includes [a, 6], and the restriction 
of this solution to fa, 5] is a continuous function of <é),a,>. If F satisfies 
the hypotheses of Theorem 2.2, then this dependence is continuously 
differentiable. 


Finally, suppose that F depends continuously (or continuously differ- 
entiably) on a parameter 4, so that we have F(), t, a) on Mf x I xX A. Now the 
solution f to the initial-value problem 


SO=FULIO), Ky=a 


depends on the parameter \ as well as on the initial condition f(¢;) = «,, and if 
the reader has fully understood our arguments above, he will see that we can 
show in the same way that the dependence of f on X is also continuous (con- 
tinuously differentiable). We shall not go into these details here. 


276 DIFFERENTIAL EQUATIONS 6.3 


3. THE LINEAR EQUATION 


We now suppose that the function F of Section 1 is from F xX W to W and eon- 
tinuous, and that F@, a) is linear in @ for each fixed £. It is not hard to see that 
we then automatically have the strong Lipschitz hypothesis of Theorem 1.4, 
which we shall in any case now assume. Here this is a boundedness condition 
on a linear map: we are assuming that F(f, a) = Ty{«), where T, € Hom W, 
and that '}f;|| < c(t) for all ¢, where e(é) is continuous on J. 

As one might expect, in this situation the existence and uniqueness theory of 
Section 1 makes contact with general linear theory. Let Xo be the vector space 
@(I, W) of all continuous functions from J to W, and let X, be its subspace 
e1(7, W) of all functions having continuous first derivatives. Norms will play 
no role in our theorem. 


Theorem 3.1. The mapping S:X,— Xo defined by setting g = Sf if 
git) = f’() — P(t, f@)) isa surjective linear mapping. The set N of global 
solutions of the differential equation da/dt = F(t, x) is the null space of S, 
and is therefore, in particular, a vector space. For each tg © J the restrietion 
to N of the coordinate (evaluation) mapping 7,,: f > f{io) isan isomorphism 
from N to W. The null space of 7, is therefore a complement of Nin Xj, 
and so determines a right inverse # of S. The mapping fr «Sf, f(to) > 
is an isomorphism from X, to Xo x W, and this fact is equivalent to all 
the above assertions. 


Proof. For any fixed g in Xo we set G(é, a) = F(t, a) + g(t) and consider the 
(nonlinear) equation da/dt = G(t, a). By Theorems 1.3 and 1.4 it has a unique 
maximal solution f through any initial point <fp, ag >, and the domain of f is 
the whole of 7. That is, for each pair <g,a> in Xq X W there is a unique f in 
X, such that < Sf, flo) > = <g,a>. The mapping 


<8, Ti > ifr <Sf, flo) > 


is thus bijective, and since it is clearly linear, it is an isomorphism. In particular, 
S is surjective. The null space NV of S is the inverse image of {0} x W under the 
above isomorphism; that is, 3, [ N is an isomorphism from N to W. 

Finally, the null space 44 of a;, is the inverse image of Xo < {0} under 
<8, 7,,>, and the direct sum decomposition X; = Mf @ N simply reflects the 
decomposition X¥9 xX W = (Xo x {0}) ® ({0} « W) under the inverse isomor- 
phism. This finishes the proof of the theorem. U 


The problem of finding, fora given g in XQ and a given ag in W, the unique f 
in X, such that S{f} = g and f(ig) = a is called the initial-value problem. At 
the theoretical level, the problem is solved by the above theorem, which states 
that the uniquely determined f exists. At the practical level of computation, the 
problem remains important. 

The fact that AZ = M;,, isa complement of N breaks down the initial-value 
problem into two independent subproblems. The right inverse # associated with 


6.3 THE LINEAR EQUATION 277 


M,, finds h in X, such that S(k) = g and A(£)) = 0. The inverse of the isomor- 
phism {+> f(to) from N to W selects that & in X, such that S(*) = 0 and 
k(fp) = ay. Then f= h+ 4%. The first subproblem is the problem of “solving 
the inhomogeneous equation with homogeneous initial data”, and the second is 
the problem of “solving the homogeneous equation with inhomogeneous initial 
data”. In a certain sense the initial-value problem is the “direct sum” of these 
two independent problems. 

We shall now study the homogeneous equation da/dt = T,(a) more closely. 
As we saw above, its solution space N is isomorphie to W under each projection 
map 7, = fr f(f). Let ¢; be this isomorphism (so that ¢, = a, [ N). We now 
choose some fixed fy in J—we may as well suppose that 7 contains 0 and take 
to = O—and set K, = ¢:° ¢o'. Then {K;} is a one-parameter family of linear 
isomorphisms of W with itself, and if we set fg() = K,(8), then fg is the solution 
of da/di = T(x) passing through <0, @>. We call K, a fundamental solution of 
the homogeneous equation da/di = T;{a). 

Since fp(t) = T.(fe(t)), we see that d(K,)/di = T,° K, in the sense that 
the equation is true at each 8 in W. However, the derivative d(K,)/di does not 
necessarily exist as a norm limit in Hom W, This is because our hypotheses on 7, 
do not imply that the mapping {+> 7’, is continuous from J to Hom W. If this 
mapping zs continuous, then the mapping <i, A> +> To A is continuous from 
I xX Hom W to Hom W, and the initial-value problem 


dA/dt = T,o A, Ajp=l 


has a unique solution A, in @'(J, Hom W). Because evaluation at 9 is a bounded 
linear mapping from Hom W to W, A;(@) is a differentiable function of ¢ and 


dA (B)/dt = (dA,/dt)(8) = T,(A,(6)). 


This implies that A,(8) = K,(8) for all 8, so K,= A,. In particular, the 
fundamental solution > K;, is now a differentiable map into Hom W, and 
dK ,/di = T,°o K;. We have proved the following theorem. 


Theorem 3,2. Let {> 7’, be a continuous map from an interval neighbor- 
hood J of 0 to Hom W. Then the fundamental solution {+ K, of the 
differential equation da/dt = T,(a) is the parametrized are from I to 
Hom W that is the solution of the initial-value problem ¢4A/dt = To A, 
Ao = I . 


In terms of the isomorphisms K; = K(i), we can now obtain an explicit 
solution for the inhomogeneous equation de/di = T,(a) + g(). We want f such 
that 


£'O — TAGY) = 9. 


Now K’{t) = T,° K(i), so that 7, = K’()o K()—}, and it follows from 
Exercise 8.12 of Chapter 4 and the general product rule for differentiation 


278 DIFFERENTIAL EQUATIONS 6.5 


(Theorem 8.4 of Chapter 3) that the left side of the equation above is exactly 


Ko (Loon). 


The equation we have to solve can thus be rewritten as 


£ (KO (FO)] = K-90). 


We therefore have an obvious solution, and even if the reader has found our 
motivating argument too technical, he should be able to check the solution by 
differentiating. 


Theorem 3.3, In the context of Theorem 3.2, the function 
HO) = Ki[ {Kr '(g(s)) ds] 
0 & 
is the solution of the inhomogeneous initial-value problem 
da/dt= Tila)+¢gM, ff) = 0. 


This therefore is a formula for the right inverse R of S determined by the 
complement Jf of the null space N of S. 


The special case of the constant coefficient equation, where the “coefficient” 
operator T;, is a fixed 7 in Hom W’, is extremely important. The first new fact. 
to be observed is that if f is a solution of da/dt = T(a), then so is f’, For the 
equation f’(t) = TCf(d)) has a differentiable right-hand side, and differentiating, 
we get f"(1) = T(f'(). That is: 


Lemma 3.1. The solution space NV of the constant coefficient equation 
da/di = T(«) is invariant under the derivative operator D, 


Moreover, we see from the differential equation that the operator D on N 
is just composition with 7. More precisely, the equation f’(t) = T{f(O) can be 
rewritten 7,° D = FT o 7, and since the restriction of 7, to N is the isomor- 
phism ¢; from N to W, this equation can be solved for T. We thus have the 
following lemma. 


Lemma 3,2. For cach fixed ¢ the isomorphism ¢, from N to W takes the 
derivative operator D on N to the operator T on W. That is, 
T=¢2Deg,". 


The equation for the fundamental] solution K, is now dS/di = TS. In the 
elementary calculus this is the equation for the exponential function, which leads 
us to expect and immediately check that K, = e'7. (See the end of Section 8 of 
Chapter 4.) The solution of da/dé = Ta) through <Q, §> is thus the function 


=; T'(8) 


6.3 THE LINEAR EQUATION 279 


If T satisfies a polynomial equation p(T) = 0, as we know it must if W is 
finite-dimensional, then our analysis can be carried significantly further. Suppose 
for now that p has only real roots, so that its relatively prime factorization is 
p(t) = [I(x — A)”*. Then we know from Theorem 5.5 of Chapter 1 that W 
is the direct sum W = @* W; of the null spaces W; of the transformations 
(T — d,)*, and that each W; is invariant under 7. This gives us a much simpler 
form for the solution curve e’"a if the point « is in one of the null spaces W;. 
Taking such a subspace W; itself as W for the moment, we have (T — AJ)" = 0, 
so that 7 = 7+ RF, where R™ = 0, and the factorization e'? = ee, 
together with the now finite series expansion of e**, gives us 


ee =e [- + iR(a) +--- +87 i), 


Note that the number of terms on the right is the degree of the factor (¢ — )™ 
in the polynomial p(?). 

In the general situation where W = @* W,, we have a = Yk a;, e'? (a) = 
Dk ef? (a;), and each e'?(a,) of the above form. The solution of f’() = T(J) 
through the general point <0,a> is thus a finite sum of terms of the form 
eg. ;, the number of terms being the degree of the polynomial p. 

If W is a complex Banach space, then the restriction that p have only real 
roots is superfluous. We get exactly the same formula but with complex values 
of 3. This introduces more variety into the behavior of the solution curves 
since an outside exponential factor e' = ee“ now has a periodic factor if 
vy ¥ 0. 

Altogether we have proved the following theorem. 


Theorem 3.4, If W is a real or complex Banach space and 7 € Hom W, 
then the solution curve in W of the initial-value problem f’(!) = T(f(d)), 
FO) = 8B, is 


c # : 
ft) = e768 = YD = (8). 
0 J: 
If T satisfies a polynomial equation (7 — 4)” = 0, then 
_ ath roe git m—1 
where R = T — XJ. If T' satisfies a polynomial equation p(T) = 0 and p 


has the relatively prime factorization p(z) = [Tk ( —~ a)”, then f() is a 
sum of & terms of the above type, and so has the form 


IO = > ve'g,,, 


where the number of terms on the right is the degree of the polynomial p, 
and each §,; is a fixed (constant) vector. 


280 DIFFERENTIAL EQUATIONS 6.3 


It is important to notice how the asymptotic behavior of f(é) as t > + o is 
controlled by the polynomial roots \;. We first restrict ourselves to the solution 
through a vector a in one of the subspaces W,, which amounts to supposing that 
(T —)™ = 0. Then if d has a positive real part, so that e!* = ee with 
p > 0, then || f(t)|| — oo in exponential fashion. If \ has a negative real pari, 
then f(¢) approaches zero as t —> o (but its norm becomes infinite exponentially 
fast ast —» —oo)}. If the real part of d is zero, then |[f(t),} > o like i”~! if 
m > 1. Thus the only way for f to be bounded on the whole of R is for the real 
part of 4 to be zero and m = 1, in which case f is periodic. Similarly, in the 
general case where p(T) = T]* (T — ,)"" = 0, it will be true that all the 
solution curves are bounded on the whole of RR if and only if the roots X» are all 
pure imaginary and all the multiplicities m, are 1. 


EXERCISES 


3.1 Let 7 be an open interval in R, and let W be a normed linear space. Let F(t, a) 
be u continuous function from J x I¥ to W which is linear in @ for each fixed ¢. Prove 
that there is a function c(t} which is bounded on every closed interval [a, d] included 
in J and such that ||P(é, «)|| < e(¢)|lo|| for all a and 4. Then show that c can be made 
continuous, (You may want to use the Heine-Borel property: If [@, 5) is covered by a 
collection of open intervals, then some finite subcollection already covers [a, 4].) 

3.2 In the text we omitted checking that fo f™ — Gf, f’,...,f%7}) is sur- 
jective from X, to X9. Prove that this is so by tracking down the surjectivity through 
the reduction to a first-order system. 


3.3 Suppose that the coefficients a,(4) in the operator 
Ti= Dag” 
ny 


are all themselves in €!. Show that the null space N of T is a subspace of @*+!, State 
a generalization of this theorem and indicate roughly why it is true. 

3.4 Suppose that W is a Banach space, JT € Hom W, and 8 is an eigenvector of 7 
with eigenvalue r, Show that the solution of the constant coefficient equation da/dt = 
T(e) through <0, 8 > is f(t) = eA. 

3.5 Suppose next that W is finite-dimensional and has a basis {8;}7 consisting of 
eigenvectors of 7, with corresponding eigenvalues r; Find a formula for the solution 
through <0, «> in terms of the basis expansion of a. 

3.6 A very important special case of the linear equation da/dt = Tr(a) is when the 
operator function 7 is periodic. Suppose, for example, that 7,41 = T, for all ¢. 
Show that then K.;, = K.(Ky)" for allé and n. 

Assume next that K, has a logarithm, and so can be written Ky; = e4 for some A 
in Hom W. (We know from Exercise 11.19 of Chapter 4 that this is always possible 
if W is finite-dimensional.) Show that now K, can be written in the form 


K, = Bite’, 
where B(é} is periodic with period 1. 


6.4 THE nTH-ORDER LINEAR EQUATION 281 


3.7 Continuing the above exercise, suppose now that W is a finitc-dimensional 
complex vector space. Using the analysis of ef4@ given in the text, show that the 
differential equation da/df = T.(a) has a periodic solution (with any period) only if 
A has an eigenvalue of absolute value 1. Show also that if A; has an nth root of 
unity as an eigenvalue, then the differential equation has a periodic solution with 
period n. 

3.8 Write out the special form that the formula of Theorem 3.3 takes in the constant 
coefficient situation. 

3.9 It is interesting to look at the facts of Theorem 3.1 from the point of view of 
Theorem 5.3 of Chapter 1. Assume that S: X; — Xo is surjective and that its null 
space N is isomorphic to W under the coordinate (evaluation) map my. Prove that if @ 
is the nullspace of w:) in X1, then S [ A¢ is an isomorphism onto Xo by applying this 
theorem, 


4. THE nTH-ORDER LINEAR EQUATION 
The nth-order linear differential equation is the equation 
d"a/dt™ = Gt, a, da/dt,...,d*~1a/dt™—), 


where G(é, a) = G(t, a,,..., &n) isnow linear from V = W® to W for each tin Z. 
We convert this to a first-order equation da/dt = F(t, a) just as before, where 
now F isa map from J X V to V that is linear in its second variable a, F(t, a) = 
T (a). 

Our proof of Theorem 1.5 showed that a function f in eZ, W) is a solution 
of the nth-order equation d"a/dt™ = G(t, a,...,d"—1a/dt"—") if and only if 
the n-tuplet yf = </f, f’,...,f'"~Y > isa solution of the first-order equation 
da/di = F(t, ae) = T,(e). We know that the latter solutions form a vector 
subspace N of e'(7,W%), and since the map ¥:fro <f,f’,...,f"7U> is 
linear from ©", W) to e'(7, W"), it follows that the set N of solutions of the 
nth-order equation is a subspace of C"(7, W) and y [ V is an isomorphism from 
'N to N. Since the coordinate evaluation », = 7, [ N isan isomorphism from N 
to W™ for each ¢ (Theorem 3.1), it follows that the map 


Trop: fre <ft),f'®,.--, fe (> 


takes VN isomorphically to W”. Its null space Mf, is a complement of NV in ©”, as 
before. Here M, is the set of functions f in e@*(7, W) such that f@) = --- = 
tiated (4) = 0. 

We now consider the special case W = R. For each fixed t, G is now a linear 
map from R” toe R, that is, an element of (R”)*, and its coordinate set with 
respect to the standard basis is an n-tuple k = <k,,...,k,>. Since the 
linear map varies continuously with , the n-tuple k varies continuously with ¢. 
Thus, when we take? into account, we have an n-tuple k(t) = <k,(t),..., half) > 
of continuous real-valued functions on J such that 


Gi, 1, .-+,%) = DS kOe. 
*=1 


282 DIFFERENTIAL EQUATIONS 6.4 


The solution space N of the »th-order differential equation 
d"a/dt™ = Gla,...,d”—1e/di?—}, 0) 


is just the null space of the linear transformation L: e*?(J, R) > e°/, R) 
defined by 
(LA) = FPO — kaQf" PWD — + — eID. 


If we shift indices to coincide with the order of the derivative, and if we let f°” 
also have a coefficient function, then our nth-order linear differential operator / 
appears as 


(LAO = anltfr() +--+ + aol sf. 


Giving f'” a coefficient function a, changes nothing provided a, (2) is never 
zero, since then it can be divided out to give the form we have studied. This is 
called the regular case. The singular case, where a,(t) is zero for some é, requires 
further study, and we shall not go into it here. 

We recapitulate what our general linear theory tells us about this situation. 


Theorem 4.1, ZL is a surjective linear transformation from the space @”(/) 
of all real-valued functions on J having continuous derivatives through 
order » to the space @°(7) = @(7) of continuous functions on 7. Its null 
space NV is the solution space of our original differential equation. For 
each ty in 7 the restriction to N of the mapping gi, °¥:f> <f(to),-.-, 
fig) > is an isomorphism from N to R®, and the set Af, of functions 
f in @” such that f(tg) = --- = f'*7? (ty) = 0 is therefore a complement. 
of MN in @”(7}, and determines a linear right inverse of L. 


The practical problem of “solving” the differential equation E(f) = g for f 
when ¢ is given falls into two parts. Virst we have to find the null space W of L, 
that is, we have to solve the homogeneous equation L(f} = 0. Since N is an 
n-dimensional vector space, the problem of delineating it is equivalent to finding 
a basis, and this is clearly the efficient way to proceed. Our first problem there. 
fore is to find » linearly independent, solutions {w,;}7 of L(f} = 0. Our second 
problem is to find 2 right inverse of L, that is, a linear way of picking one f such 
that L(f) = g¢ for each g. Here the obvious thing to do is to try to make the 
formula of Theorem 3.3 into a practical computation. If » is one solution of 
Lf) = g, then of course the sct of all solutions is the affine subspace N + ». 

We shall start with the first problem, that of finding a basis {u,} 7] of solutions 
to L(f) = 0. Unfortunately, there is no gencral method available, and we have 
to be content with partial sucecss. We shall sce that we can casily solve the 
first-order equation directly, and that if we can find one solution of the xth- 
order equation, then we can reduce the problem to solving an equation of order 
nm — 1. Moreover, in the very important special case of an operator Z with 
constant coefficients, Theorem 3,4 gives a complete explicit solution. 

The first-order homogeneous linear equation can be written in the form 
y’ + att)y = 0, where the coefficient of y’ has been divided out. Dividing by y 


6.4 THE nTH-ORDER LINEAR EQUATION 283 


and remembering that y’/y = (log y)’, we see that, formally at least, a solution 
is given by logy = — fa(t) dt or y = e/* 4, and we can check it by inspec- 
tion. Thus the equation y’ + y/f = 0 has a solution y = e—'°%* = 1/1, as the 
reader might have noticed directly. 

Suppose now that Z is an nth-order operator and that we know one solution 
u of Lf = 0. Our problem then is to find x — 1 solutions v,,...,%,—1 inde 
pendent of cach other and of u. It might even be reasonable to guess that these 
could be determined as solutions of an cquation of order n — 1. We try to find 
a second solution »{é) in the form e(t)u(é), where c(é) is an unknown function. 
Our motivation, in part, is that such a sclution would automatically be inde- 
pendent of w unless eft) turns out to be a constant. 

Now if off) = e(f)u(d), then ev’ = eu’ + c’u, and generally 


} é, 

sf) >> @ og F— A), 
i=o \t 

If we write down L(v) = LJ a;(é)v(® and collect those terms involving c(t), 

we get 


n 
Lv) = e(t) aj + terms involving ec’, .. . , e” 
r 


= cL(u) + Sc’) = S(e’), 


where S is a certain linear differential operator of order » — 1 which can be 
explicitly computed from the above formulas. We claim that solving S(f) = 0 
solves our original problem. For suppose that {g;}%7! is a basis for the null 
apace of 8, and set ¢,{¢) = fo g; Then Leu) = S(t) = Sg) = 0 for i= 
1,...,% —1. Moreover, u,¢)%,...,¢,—1% are independent, for if «= 
Dy? ke, then 1 = P97! ket) and 0 = LI? ke) = LI! kigi(é), con- 
tradicting the independence of the set {g;}. 

We have thus shown that if we can find one solution w of the nth-order 
equation Lf = 0, then its complete solution is reduced to solving an equation 
Sf = 0 of order n — I (although our independence argument was a little 
sketchy). 

This reduction procedure does not combine with the solution of the first- 
order equation to build up a sequence of independent solutions of the nth-order 
equation because, roughly speaking, it “works off the top instead of off the 
bottom”. For the combination to be successful, we would have to be able to 
find from a given xth-order operator a first-order operator S such that N(S) Cc 
N{L), and we can’t do this in general. However, we can do it when the coefficient 
functions in E are all constants, although we shall in fact proceed differently. 

Meanwhile it is valuable to note that a second-order equation Lf = 0 can 
be solved completely if we can find one solution uw, since the above argument 
reduces the remaining problem to a first-order equation which can then be solved 
by an integration, as we saw earlier. Consider, for instance, the equation 
y” — 2y/t? = 0 over any interval J not containing 0, so that ao(f) = 1/t? is 
continuous on J. We see by inspection that u(t) = ¢? is one solution. Then we 


284 DIFFERENTIAL EQUATIONS 6.4 


know that we can find a solution v(t) independent of u(t) in the form v(t) = te(t) 
and that the problem will become a first-order problem for c’. We have, in fact, 
y = te! + Me and vo = te + 4tc’ + 2c, so that Liv) = vo” — 2u/t? = 
tc + 4tc’, and L(v) = 0 if and only if (e’)’ + (4/Oc’ = 0. Thus 


a gai sate = gvtlost 1/i4, je ve 


(to within a scalar multiple; we only want a basis!), and » = t’e(t) = 1/2. 

(The reader may wish to cheek that this 7s the promised solution.) The nul! 

space of the operator L(f) = f” — 2f/t? is thus the linear span of {t?, 1/1). 
We now turn to an important tractable case, the differential operator 


Lf = anf TF anf? +--+ + aol, 


where the coefficients @; are constants and @, might as well be taken to 1. What 
makes this case accessible is that now Z is a polynomial in the derivative operator 
D, That is, if Df = f’, so that D’f = f%, then L = p(D), where p(x) = 
Do ac’. 

The most elegant, but not the most elementary, way to handle this equation 
is to go over to the equivalent first-order system dx/di = T(x) on R® and to 
apply the relevant theory from the last section. 


Theorem 4.2. If p(f) = (t — 6)", then the solution space N of the con- 
stant coefficient nth-order equation p(D)f = 0 has the basis 


{e*, te Arey cami aoa 


If p(Q) is a polynomial which has a relatively prime factorization p(} = 
I‘ pi(t) with each p,(t) of the above form, then the solution space of the 
constant coefficient equation p(D)f = 0 has the basis UB,, where B; is the 
above basis for the solution space N; of p,(D)f = 0. 


Proof. We know that the mapping ¥: fre <f,f’,...,f'"~ "> is an isomor- 
phism from the null space ¥ of p(D) to the null space N of dx/di — T(x). It is 
clear that ¥ commutes with differentiation, ¥(Df) = <f’,...,f'> = Db(p, 
and since we know that N is invariant under D by Lemma 3.1, it follows (and can 
easily be checked directly) that NV is invariant under D. By Lemma 3.2 we have 
T= g° Dog, ', which simply says that the isomorphism g,:: N — R® takes 
the operator D on N into the operator 7 on R*. Altogether yo ¥ takes D 
on ¥ into 7 on R”, and since p(D) = 0 on N, it follows that p(T) = 0 on R". 

We saw in Theorem 3.4 that if p(T} = 0 and p = ( — 8)”, then the 
solution space N of dx/dé = T(x) is spanned by vectors of the form 


bt. eae, i719 


ex, x. 


The first coordinates of the -tuple-valued functions g in N form the space ¥ 
(under the isomorphism f = ¥~!g), and we therefore see that. N is spanned by 
the functions e,...,¢"7'e. Since N is n-dimensional, and since there are 1 
of these functions, the spanning set forms a basis. 


6.4 THE nTH-ORDER LINEAR EQUATION 285 


The remainder of the theorem can be viewed as the combination of the 
above and the direet application of Theorem 5.5 of Chapter 1 to the equation 
o(D) = 0onN, oras the carry-over to N under the isomorphism ¥~" of the facts 
already established for N in the last section. 0 


If the roots of the polynomial p ure not all real, then we have to resort to 
the complexification theory that we developed in the exercises of Seetion 11, 
Chapter 4. Except for one final step, the results are the same. The one extra 
fact that has to be applied is that the null space of a real operator 7 acting on a 
real vector space Y is exactly the intersection with FY of the null space of the 
complexification S of 7 acting on the complexification Z = Y @ éY of Y. 
This implies that if p(2) is a polynomial with real coefficients, then we get the 
real solutions of p(D)f = 0 as the real parts of the complex solutions, In order 
to see exactly what this means, suppose that g(x) = (x? — 2bx + c)” is one of 
the relatively prime factors of p(x) over R, with 2? — 2bz + ¢ irreducible over R. 
Over C, g(x) factors into (x — A)"(x — k)™, where \ = b+ 2wand w? = c — b?. 
It follows from our general theory above that the complex 27-dimensional 
null space of g(D) is the complex span of 


{e™, te, ee amet gt er te eee oa ad : 


The real parts of the complex linear combinations of these 2m functions is & 
2m-dimensional real vector space spanned by the real parts of the above functions 
and the real parts of 7 times the above functions. That is, the null space of the 
real operator g(D) is a 2m-dimensional real space spanned by 


fe cos wt, te’ cos wi, ..., 1 —1e” cos wt; e” sin wl, ..., #—1e sin wl}. 


Since there are 2m of these functions, they must be independent and must form 
a basis for the real solution space of g(D)f = 0. Thus, 


Theorem 4.3. If p(i} = (42 + 2bt-+ c)" and b? < ¢, then the solution space 
of the constant coefficient 2mth-order equation p(D)f = 0 has the basis 


{t'e™ cos wi}Til U {é%e™ sin wth Ra, 


where w? = c — b?. For any polynomial p(é) with real coefficients, if p(Q) = 
Tié p:(2) is its relatively prime factorization into powers of linear factors and 
powers of irreducible quadratic factors, then the solution space N of 
p(D)f = 0 has the basis Uf B;, where B; is the basis for the null space of 
p:(D) that we displayed above if p;(é) is a power of an irreducible quadratic, 
and B; is the basis of Theorem 4.2 if pi() is a power of a linear factor. 


Suppose, for example, that we want to find a basis for the null space of 
D‘—1=0. Here p(x) = zt —1= (@ — I+ 1)@ - (e+). The 
basis for the complex solution space is therefore {e', e—', e”, e~“}. Since e* = 


cosé-+ isin t, the basis for the real solution space is {e', e—‘, cos é, sin B. 


286 DIFFERENTIAL EQUATIONS 6.4 


The same problem for D? — 1 = 0 gives us 
plz) = e® —1 = (e — 1)? ee + 


(a — 1) pe LES ia) (2+ oe =) ; 


so thal the basis for the complex sclution space is 
UF YDIAE gl —tyF/ 21 
? 


{ef e € 


and the basis for the real solution space is 
fe', e—"#? cos (4/3t/2), e!? sin (4/3t/2)}. 


*Our results above suggest that the collection @ of all real-valued 
solutions of constant coefficient homogeneous linear differential equations con- 
tains the functions ¢', e”, cos wt, sin wt for all 7,7, and w, and is closed under 
addition and multiplication, and is in fact the algebra generated by these functions. 

We can easily prove this conjecture. We first consider sums. Suppose that 
T(f) = 0 and that S{g) = 0, where S and T are two such constant coefficient 
operators. Then f+ g is in the null space of So T because S and T commute: 
(So T)f+g) = Se Tif) + Se Tig) = S(TA) + TSy) =0+0=0. We 
know that S and 7 commute because they are both polynomials in D. 

Iu order to treat products, we first have to recognize that the linear span of 
all the trigonometric functions sin aé, cos bt is an algebra. In other words, any 
finite product of such functions is a linear combination of such functions. This 
is the role of a certain class of trigonometric identities, such as 2 sin x cos y = 
sin (¢ + y) + sin (t — y), which the reader has undoubtedly had to struggle 
with. (And again the mystery disappears when we are allowed to treat them 
as complex exponentials.) Then we observe that any function in the algebra @ is 
a finite sum of terms each of which is of the form t’e” sin wt or t’e” cos wt for 
some 2,7, and w. We can exhibit an operator T having such a function in its 
null space, and our finite sum of such terms will then be in the null space of the 
composition of these operators T by our first argument. 

We are tempted to say one more thing. The functions ¢*, e, sin wf, cos wt, 
and sums of their products can be shown to be exactly the continuous functions 
f: R — R such that the set of translates of f has a finite-dimensional span. That 
is, if we define translation through x, K,,by (K,f)(@) = f(¢ — x), then for 
exactly the above functions f the linear span of {K.f, x € R} is finite-dimensional. 
This second characterization of exactly the same class of functions cannot be 
accidental. Part of the secret lies in the fact that the constant coefficient oper- 
ators T are exactly those linear differential operators that commute with trans- 
lation. That is, if T is a linear differential operator, then To K, = K,° T for 
all x if and only if 7 has constant coefficients. Now we have noted in an early 
chapter that if ToS = So T, then the null space of 7 is invariant under S. 
Therefore, the null space MW of a constant coefficient operator 7 is invariant under 
all translations: K,[N] Cc N for all 2. Now we know that N is finite-dimensional 


6.4 THE nTH-ORDER LINEAR EQUATION 287 


from our differential equation theory. Therefore, the functions in N are such 
that their translates have a finite-dimensional span! 

This device of gaining additional information about the null space V of a 
linear operator 7 by finding operators S that commute with T, so that N is 
S-invariant, is much used in advanced mathematics. It is especially important 
when we have a group of commuting operators S, as we do in the above case with 
the operators S = K,. 

What we have not shown is that if a continuous function f is such that its 
translation generates a finite-dimensional vector space, then f is in the null space 
of some constant coefficient operator p(D). This is delicate, and it depends on 
showing that if {K,} is a one-parameter family of linear transformations on a 
finite-dimensional space such that K,4, = K,° K, and K, — / as £— 0, then 
there is an S in Hom V such that K, = e'_» 


EXERCISES 


Find solutions for the following equations. 


4.1 2” — 32/+ 22 = 0 4.2 2% + 22 — 32 = 90 
4.3 2/+ 22/4 32 = 0 44 2% 4 2c’ +2 =0 
45 2” — 3c’ + 327’ 2 = 0 46 2’ —x=0 

4.7 2 —2% = 0 438 27” =0 


4,9 x’? — x =O 
4,10 Solve the initial-value problem 2” + 42’ — 5¢ = 0,2(0) = 1,2’(0) = 2. 
4.11 Solve the initial-value problem 2” + 2’ = 0, r(0) = 0, 2’(0) = —1,2(0) = 1. 
4.12 Find one solution x of the equation 4f2z" + 2 = © by trying u(t) = ¢", and then 
find a, second solution as in the text by setting v(@) = c(t)u(é). 
4,13 Solve Ba” — 32’ + 3x = 0 by trying wf = @. 
4,14 Solve iz”+ 2’ = 0. 
4.15 Solve t(2’” + 2’) 4+ 227+ 2) = 0. 
4.16 Knowing that e—°* cos wi and e~* sin wt are solutions of a second-order linear 


differential equation, and obscrving that their values at 0 are 1 and 0, we know that 
they are independent, Why? 


4.17 Find constant coefficient differentia] equations of which the following functions 
are solutions: #2, sin ¢, 2? sin ¢. 


4,18 If f and g are independent solutions of a second-order linear differential equation 
uw” + au’ + agu = 0 with continuous coefficient functions, then we know that the 
vectors <f(x), f’(x) > and <g(zx), g’(x) > are independent at every point x. Show 
conversely that if two functions have this latter property, then they are solutions 
of a second-order differential equation. . 


4.19 Solve the equation (D — a)°f = 0 by applying the order-reducing procedure 
diseussed in the text starting with the obvious solution e%, 


288 DIFFERENTIAL EQUATIONS 6.5 


5. SOLVING THE INHOMOGENEOUS EQUATION 


We come now to the problem of solving the inhomogeneous equation L(f) = g. 
We shall briefly describe a practical method which works easily some of the time 
and a theoretical method which works al! the time, but which may be hard to 
apply. The latter is just the translation of Theorem 3.3 into matrix language. 

We first. consider the constant coefficient equation L(f) = g in the special 
case where g itself is in the null space of a constant coefficient operator S. A 
simple example is y’ — ay — e* (or y’ — ay = sin bt), where g(t) = e* is in the 
null space of S = (D — 6). In such a situation a solution f must be in the 
null space of S o F., for So L(f) = S(g) = 0. We know what all these functions 
are, and our problem is to select f among them such that. F.(f) is the given ¢. 

Yor the moment. suppose that the polynomials / and S (polynomials in D) 
have no factors in common. Then we know that £ is an isomorphism on the 
null space Ng of S and therefore that there exists an f in Ng such that Lf = g. 
Since we have a basis for Ns, we could construct the matrix for the action of L on 
Ns and find f by solving a matrix equation, but the simplest thing to do is take 
a general linear combination of the basis, with unknown coefficients, Jet L act 
on it, and see what the coefficients must be to give g. 

For example, to solve y’ — ay = e*, we try fi) = ce® and apply 


1.:(D — a)(ce) = (b — adee” = e*, 
and we see that e = 1/({b — a). 

Again, to solve 7’ — ay = cos bi, we observe that cos bt is in the null space 
of S = D*? + b? and that this null space has the basis {sin bf, cos bf}. We 
therefore set, f(é) = e, sin bt + cg cos bt and solve (D — a)f = cos dt, getting 

(—ae, — beg) sin bf + (be, — ag) cos bt = cos bt, 
—ac, -- beg = 0, 
be; — aeg = 1, 
and 


z sin bt 3 cos bb. 


i) a 
j= 
#6 a+b a?+b 
When ZL and S do have factors in common, the situation is more complicated, 
but a similar proceedure can be proved to work. Now an extra factor ¢’ must be 
introduced, where 7 is the number of occurrences of the common factor in L£. 
For example, in solving (D -— r)*f = e™, we have So L = (D — r)*, and so 


we must set f(t} = ei7e". Our equation then becomes 
(D — r)*et®e"t = Qee’ Ze", 
and soe = $, 
For (D? + 1)f = sin ¢ we have to set f(t) = t(e; sint + ¢2 cos §), and after 
we work it out we find that ¢; = 0 and ez, = —3, so that f = —4teosé. 


6.5 SOLVING THE INHOMOGENEOUS EQUATION 289 


This procedure, called, naturally, the method of undetermined coefficients, vio- 
lates our philosophy about a solution process being a linear right inverse. Indeed, 
it is not a single process, applicable to any g occurring on the right, but varies 
with the operator S. However, when it is available, it is the easiest way to com- 
pute explicit solutions. 

We describe next a general theoretical method, called variation of parameters, 
that és a right inverse to ZL and does therefore apply to every g. Moreover, it 
inverts the general (variable coefficient) lincar nth-order operator L: 


n ., 
ENO = XL adOfPO. 

We are assuming that we know the null space N of L; that is, we assume 
known 7 linearly independent solutions {u,;}{ of the homogencous equation 
Lf = 0. What we are going to do is to translate into this context our formula 
K, Ji Ky! (g(s)) ds for the solution to da/dé = T,{a) + g(t). Since 


Vif Kf, f',...,f 79> 


is an isomorphism from the solution space NV of the nth-order equation L{f} = 0 
to the solution space N of the equivalent first-order system dx/di = T;{x)}, it 
follows that if we have a basis {u;}7 for N, then the columns of the matrix 
w;; = uf form a basis for N. 

Let w(t) be the matrix w,;(2) = u~(@). Since evaluation at ¢ is the isomor- 
phism y, from N to R*, the columns of w(t) form a basis for R”, for each ¢. 
But K,(«) is the value at £ of the solution of dx/dt = T,(x) through the mitial 
point <0, @> , and it follows that the linear transformation K, takes the coliimns 
of the matrix w(Q) to the corresponding columns of w(t). The matrix for K, is 
therefore w(t) - w(0)—!, and the matrix form of our formula 


£0) = Kif (K.)-*(ats)) as 
is therefore 


£() = wit) w() 2+ f * w(0) + w(s)7? - g(s) ds. 


Moreover, since integration commutes with the application of a constant linear 
transformation (here multiplication by a constant matrix), the middle w(0) 
factors cancel, and we have the result that 


f() = wid)- [we —1. 9s) ds 


is the solution of dx/dt = T,(x) + g(@) which passes through <0,@>. Finally, 
set k(s) = w(s)—'- g(s), so that this solution formula splits into the pair 


£0) = wi): f “k(e) ds, w(s) - k(s) = g(s). 


290 DIFFERENTIAL EQUATIONS 6.5 


Now we want to solve the inhomogeneous nth-order equation L(f) = g, and 
this means solving the first-order system with g = <0,...,0,g>. Therefore, 
the second equation above is equivalent to 


DX wikis) =0, i<n, 
DL wnj(s)kj(s) = gs). 


Moreover, the solution f of the nth-order equation is the first component of the 
n-tuple f (that is, f = ¥7'f), and so we end up with 


{QO = > wo f k{s) ds = X wes, 


where c,{d) is the antiderivative Si k,;{s) ds. Any other antiderivative would do 
as well, since the difference between the two resulting formulas is of the form 
Xj a4.u,(2), a solution of the homogeneous equation L(f) = 0. We have proved 
the following theorem. 


Theorem 5.1. If {u,()}7 is a basis for the solution space of the homogeneous 
equation E(k} = 0, and if f(t) = X73 ¢,()ui(t), where the derivatives ci() 
are determined as the solutions of the equations 


LY Ou =0, 7 =0,...,n—2, 
dX eiul-PO = 9), 


. 


then L(f} = g. 


We now consider a simple example of this method. The equation y” + ¥ = 
sec x has constant coefficients, and we can therefore easily find the null space of 
the homogeneous equation y” + y. A basis for it is {sin x, cos x}, But we can’t 
use the method of undetermined coefficients, because see x is not a solution of & 
constant coefficient equation. We therefore try for a solution 

v{t) = c,(x) sin x + co(x) cos 2. 
Our system of equations to be solved is 
¢e) sin x + ¢4 cosz = 0, 
c) cos x — ¢49in x = seca. 


Thus cy = —c} tanx and c}(cosz + sin x tanz) = sce a, giving 


eq=l, eb = —tang, 
¢1 = &, Cz = log cos z, 
and 
v(z) = xsin z+ (log cos x) cos 2. 
(Check it!) 


6.5 SOLVING THE INHOMOGENEOUS EQUATION 291 


This is all we shall say about the process of finding solutions. In cases where 
everything works we have complete control of the solutions of Z(f} = g, and 
we can then solve the initial-value problem. If Z has order x, then we know that 
the null space NV is 2-dimensional, and if for a given g the function » is one 
solution of the inhomogeneous equation L(f) = g, then the set of all solutions is 
the n-dimensional plane (affine subspace) Mf = N+. If we have found a 
basis {u,}7 for V, then every solution of Z(f) = gis of the form f = 377 ove; +». 
The initial-value problem is the problem of finding f such that Z(f) = g and 
f(to) = a2, f'(to) = a8, ..., f—-M(tg) = a8, where <a®,...,a°> = a® is the 
given initial value. We can now find this unique f by using these » conditions 
to determine the x coefficients ¢; inf = 30 cu; +. We get n equations in the n 
unknowns ¢;. Our ability to solve this problem uniquely again comes back to the 
fact that the matrix w,;(tp) = «°7?)(¢y) is nonsingular, as did our success in 
carrying out the variation of parameters process. 

We conclude this section by discussing a very simple and important example. 
When a perfectly elastic spring is stretched or compressed, it resists with a 
“restoring” force proportional to its deformation. If we picture a coiled spring 
lying along the z-axis, with one end fixed and the free end at the origin when 
undisturbed (Fig. 6.3), then when the coil is stretched a distance x (compression 
being negative stretching), the foree it exerts is —cx, where ¢ is a constant rep- 
resenting the stiffness, or elasticity, of the spring, and the minus sign shows 
that the force is in the direction opposite to the displacement. This is Hooke's 


law. 
mona 


Fig. 6.3 


Suppose that we attach a point mass m to the free end of the spring, pull the 
spring out to an initial position x» = @, and let go. The reader knows perfectly 
well that the system will then oscillate, and we want to describe its vibration 
explicitly. We disregard the mass of the spring itself (which amounts to ad- 
justing m), and for the moment we suppose that friction is zero, so that the 
system will oscillate forever. Newton’s law says that if the force F is applied to 
the mass m, then the particle will accelerate according to the equation 


a? 
Here F = —ex, so the equation combining the laws of Newton and Hooke is 


dx 
maa toe = 0. 


292 DIFFERENTIAL EQUATIONS 6.5 


This is almost the simplest constant coefficicnt equation, and we know that the 
general solution is 
x= ¢, sin Nt + cg cos Ni, 


where 2 = +/c/m. Our initial condition was that z = @and x’ = Owhent = 0. 
Thus cy = @ and c, = 0, so x= acosM. The particle oscillates forever 
between x = —a and x= a4, The maximum displacement a is called the 
amplitude A of the oscillation. The number of complete oscillations per unit time 
is called the frequency f, so f = 2/24 = V/c/2r/m. This is the quantitative 
expression of the intuitively clear fact that the frequency will increase with the 
stiffness c and decrease as the mass m increases. Other initial conditions are 
equally reasonable. We might consider the system originally at rest and strike 
it, so that we start with an initial velocity » and an initial displacement 0 at 
time? = 0. Now ec, = Oand x = e¢ sin Qt. In order to evaluate c,, we remem- 
ber that dx/dt = v at t = 0, and since dx/dt = c,Q cos Qi, we have »v = ¢2 
and e; = v/Q, the amplitude for this motion. In general, the initial condition 
would bez = aandz’ = »whené = 0, and the unique solution thus determined 
would involve both terms of the general solution, with amplitude to be calculated. 

The situation is both more realistic and more interesting when friction is 
taken into account. Frictional resistance is ideally a foree proportional to the 
velocity dx/dt but again with a negative sign, sinec its direction is opposite to 
that of the motion. Our new equation is thus 


dx dx 
mae that = 0, 


and we know that the system will act in quite different ways depending on the 
relationship among the constants m, k, and c, The reader will be asked to explore 
these equations further in the exercises, 

It is extraordinary that exactly the same equation governs a freely oscillating 
electric circuit. It is now written 

d?x dx , 1 
where Z, R, and C' are the inductance, resistance, and capacitance of the circuit, 
respectively, and dzv/di is the current. However, the ordinary operation of such 
a circuit involves forced rather than free oscillation. An alternating (sinusoidal) 
voltage is applied as an extra, external, “force” term, and the equation is now 
dx dx x : 
Laat Rate = asin wl. 

This shows the most interesting behavior of all. Using the method of un- 
determined coefficients, we find that the solution contains transient terms that 
die away, contributed by the homogeneous equation, and a permanent part of 
frequency w/27, arising from the inhomogeneous term @ sin wt. New phenomena 
called phase and resonance now appear, as the reader will discover in the exercises. 


6.5 SOLVING THE INHOMOGENEOUS EQUATION 293 
EXERCISES 


Find particular solutions of the following equations. 
5.1 x’ — 2x = t# 5.2 2” —x = sind 5.3 2 —x =sini+# 
5.4 xt 2 = sint 5.5 oy” — yf = x? (Here y’ = dy/dzx.) 
5.6 yy =e 
5.7 Consider the equation y’ + y = sec x that was solved in the text. To what 


interval J must we limit our discussion? Check that the particular solution found in 
the text is correct. Solve the initial-value problem for 


f"@)+flx) =seer, fO=1, f’'(O} = —-1 


Solve the following equations by variation of parameters. 

5.8 2x”-L a = tant 5.9 #4 2' = 5.10 y’4-y=1 
5.11 y —y = cose 5.12 y’+4y =sec2Qr 5.13 yo’ + 4y = seex 
5.14 Show that the general solution 


C, sin Dt + Ce cos QE 
of the frictionless elastic equation m(d?z/dt”) 4- ex = 0 can be rewritten in the form 
A sin (Q6 — ex), 


(Remember that sin (c — y) = sinz cos y— cosasiny.) This type of motion along 
@ line is ealled simple harmonic motion. 

5.15 In the above exercise express A and @ in terms of the initial values dx/di = v 
and z = a whent = 0. 


5.16 Consider now the freely vibrating system with friction taken into account, and 
therefore having the equation 


m(d?x/dt7) + k(dx/dt) + cx = 0, 


all coefficients being positive. Show that if k? < dmc, then the system oscillates forever, 
but with amplitude decreasing exponentially. Determine the frequency of oscillation. 
Use Exercise 5.14 to simplify the solution, and sketch its graph. 

5.17 Show that if the frictional force is sufficiently large (42 > 4mc), then a freely 
vibrating system does not in fact vibrate. aking the simplest case k? = 4me, sketch 
the behavior of the system for the initial condition dz/dt = O and x = a wheni = 0. 
Do the same for the initial condition dz/df = ve and zs = 0 whenit = 0. 

5.18 Use the method of undetermined coefficients to find a particular solution of the 
equation of the driven electric circuit 


2 
Let ray? = asin wl. 


Assuming that R > 0, show by a general argument that your particular solution is in 
fact the steady-state part (the part without exponential decay) of the general solution. 


294 DIFFERENTIAL EQUATIONS 6.6 


5.19 In the above exercise show that the “current” dx/dt for your solution can be 
written in the form 

as = — fat — a), 

dt” RB} X2 
where X = Lw — 1/wC. Here a is called the phase angle. 
5.20 Continuing our discussion, show that the current flowing in the circuit will have 
@ maximum amplitude when the frequency of the “impressed voltage” a sin wt is 
1/2rV/ LE, This is the phenomenon of resonance. Show also that the current is in 
phase with the impressed voltage (ie., that a = 0} if and only if 2 = C = 0. 
5.21 What is the condition that the phase a be approximately 90°? —90°? 
5.22 Inthe theory of a stable equilibrium point in a dynamical system we end up with 
two sealar products (£, ») and ((£, 9}) on a finite-dimensional vector space V, the qua- 
dralic form g(g) = 4((é, £)) being the potential energy and pit’) = 3(&’, £") being the 
kinetic energy. Now we know that dg.{&) = ((a, £)) and similarly for p, and because of 
this fact it can be shown that the Lagrangian equations can be written 


a(4. ") = ((é, )). 


Prove that a basis {8;}] can be found for V such that this vector equation becomes the 
system of second-order equations 


Pr 
at2 


where the constants A; are positive. Show therefore that the motion of the system is the 
sum of 2 linearly independent simple harmonic motions. 


= dXi, $= Tivaas 


6. THE BOUNDARY-VALUE PROBLEM 


We now turn to a problem which seems to be like the initial-value problem but 
which turns out to be of a wholly different character. Suppose that T is a second- 
order operator, which we consider over a closed interval [a, 6]. Some of the most 
important problems in physics require us to find solutions to T(f) = g such that 
j has given values at a and 8, instead of f and f’ having given values at a single 
point fo. This new problem is called a bowndary-vaiue problem, because {@, 5} is 
the boundary of the domain J = [a, 6], The boundary-value probiem, like the 
initial-value problem, breaks neatiy into two subproblems if the set 


M = {fe e7 (a, b]) : f(a) = f(b) = 0} 


turns out to be a complement of the null space ¥ of 7. However, if the reader 
will consider this general question for a moment, he will realize that he doesn’t 
have a clue to it from our initial-value development, and, in fact, wholly new 
tools have to be devised. 

Our procedure will be to forget that we are trying to solve the boundary- 
value problem and instead to speculate on the nature of a linear differential 


6.6 THE BOUNDARY-VALUE PROBLEM 295 


operator T from the point of view of scalar products and the theory of self- 
adjoint operators. That is, our present study of T will be by means of the scalar 
product (f, 9) = i? S(®Hg dt, the general problem being the usual one of 
solving Tf = g by finding a right inverse S of T. Also, as usual, S may be deter- 
mined by finding a complement M of N(T). Now, however, it turns out that if 7 
is “formally self-adjoint”, then suitable choices of Jf will make the associated 
right inverses § self-adjoint and compact, and the eigenvectors of S, computed as 
those solutions of the homogeneous equation 7f — rf = 0 which lie in M, then 
allow (relatively) the same easy handling of S, by virtue of Theorem 5.1 of 
Chapter 5, that they gave us earlier in the finite-dimensional situation. 

We first consider the notion of “formal adjoint” for an nth-order linear 
differential operator 7". The ordinary formula for integration by parts, 


[rs = fol. = [wv 


allows the derivatives of f occurring in the scalar product (Tf, g) to be shifted 
one ata time to g. At the end, fis undifferentiated and g is acted on by a certain 
nth-order linear differential operator R. The endpoint evaluations, like the 
above fg}, that accumulate step by step can be described as 


BOF, gla = re kia) f wg @le, 


where the coefficient functions &,;(¢) are linear combinations of the coefficient 
functions a,(x) and their derivatives. Thus 


(TY, 9) = GZ, Rg) i BY, gi. 


The operator # is called the formal adjoint of T, and if R = T', we say that T' is 
formally self-adjoint. 

Every application of the integration by parts formula introduces a sign 
change, and the reader may be able to see that the leading coefficient of F is 
(—1)” times the leading coefficient of 7’. Assuming this, we see that a necessary 
condition for formal self-adjointness is that 7 be even, so that R and T have the 
same first. terms. 

Supposing that T is formally self-adjomt, we seek a complement M of the 
null space V of 7 in ©*((a, 6)) with the further property that S, the associated 
right inverse of 7, is self-adjoint. as a mapping from the pre-Hilbert space 
e°({a, b]) to itself. Let us see what this further requirement amounts to. For 
any u,v € @°, set f = Su and g = Sv, so that f and g are in M and u = Ty, 
#= Ty. Then (u, Sv) = (Tf,9) = Uf, TA) + BU OL = (Su, 2) + BY, ale. 

We thus have: 


Lemma 6.1. If T is a formally self-adjoint differential operator and Af is a 
complement of the null space of T, then the right inverse of 7 determined 
by M is self-adjoint if and only if 


igeEM = BYR =0. 


206 DIFFERENTIAL EQUATIONS 6.6 


From now on we shall consider only the second-order case. However, almost 
everything that we are going to do works perfectly well for the general case, the 
price of generality being only additional notational complexity. 

We start by computing the formal adjoint of the second-order operator 
Tf = cof" + crf! + cof. We have 


(Tf, 9) = i eof"9 + festa 7 f cof, 
[oso eto] — [° sero)’ 
[ena = eral, — [' ey 
= (caf’o ~ Sleaa)') | + f° fleaa)”, 


giving 
(f, Ro) = | f(c29)" — (er9)' + (cod), 


and 
Bf, 9) = ea(f'g — g'f) + (er — €5) fo. 


Thus Rg = coy” + (2¢, — ex)g’ + (ef — cf + co)g and R = T if and only if 
2c, — cy = e4 (and cf — ef = 0), that is, c; = cy. We have proved: 


Lemma 6,2. The second-order differential operator T is formally self- 
adjoint if and only if 
TY = cof” + cxf’ + cof = (cof’)’ + cof, 
in which case 
BUS, 9) = calf’ — 9S): 


A constant coefficient operator is thus formally self-adjoint if and only 
if CQ = 0. 


Supposing that T' is formally self-adjoint, we now try to find a complement M 
of its null space N such that f,g €¢ M = B(f, g)]; = 0. Since N is two-dimen- 
sional, any complement M can be described as the intersection of the null space 
of two linear functionals 1, and 1, on Xy = ©*({a, d)). For example, the “one 
point” complement M,, that we had earlier in connection with the initial-value 
problem is the intersection of the null spaces of the two functionals 1,(f) = f(to) 
and I2(f) = f'(tp). Here, however, the vanishing of 1, and 1, for two functions 
f and g must imply that B(J, 9)Ji = eo(f'¢ — g’k = 0, and the functionals 
1,(f) must therefore involve the values of f and f' at a and at b. We would natu- 
rally guess, and it can be proved, that each of 1; and 12 must be of the form 
Uf) = kif(a) + kaf'(a) + kaf(b) + kf" (0). 

Our problem can therefore be restated as follows. We must find two linear 
functionals 1, and ly of the above general form such that if M is the intersection 


6.6 THE BOUNDARY-VALUE PROBLEM 297 


of their null spaces, then 
a) M isa complement of N, and 
b) f.9 EM > ea(f'g — Ale = 0, 
in which case we call the boundary condition [,(f) = lo(f) = 0 self-adjoint. 


Lemma 6.3. We can replace (a) by 
a’) T is injective on M, 
Proof. If T is injeetive on M, then M MN = {0}, so that the map 
fa <h(f), b(f)> 
is injective on N, and therefore, because N is two-dimensional, is an isomorphism 


from N to R? (by the corollary of Theorem 2.4, Chapter 2). Then M is a comple- 
ment: of N by Theorem 5.3 of Chapter 1. 0 


Now we can easily write down various pairs /; and /y that form a self-adjoint 
boundary condition. We list some below. 

1) fe M = f(a) = f(b) = 0 [that is, 1:(f) = f(a) and 12(f) = f(b)]. 

2) fEM ef'(a) = f'(b) = 0. 

3) More generally, f(a) = kf(a), f’(b) = ef(b). (In fact, l; can be any / that 
depends only on the values at a, and /» can be any | that depends only on b. 
Thus 1,(/) = kif(a) + kof’(a), and if 4(f) = (yg) = 0, then the pairs 
<f(a), f'(a)> and <g(a),g'(a)> are dependent, since both lie in the one- 
dimensional null space of /;, and so f’g — g'fla = 0. The same holds for /2 
and 6, so that this split pair of endpoint conditions makes b(f, g)]2 = 0 by 
making the values of B at a and at 6 separately 0.) 

4) If co(a) = eo(b), then take f E M = f(a) = f(b) and f'(a) = f'(b). That 
is, 1,(f) = f(a) — f(b) and le(f) = f’(a) — f’(0). 

We now show that in every case but (3) the condition (a’) also holds if we 
replace 7’ by 7’ — \ fora suitable \. This is true also for case (3), but the proof is 
harder, and we shall omit it. 


Lemma 6.4. Suppose that M is defined by one of the self-adjoint boundary 
conditions (1), (2), or (4) above, that co(t) = m > 0 on [a,b], and that 
d > eo(t) + Lon [a,b]. Then 

I(T — Jf, f)| & milf" + Wfile 
for all f= M. In particular, M is a complement of the null space of 7’ — 
and hence defines a self-adjoint right inverse of T’ — . 


Proof. We have . 
(A-— MAN =— [ @sI+ [ a—eds? 


= -af'f] + [aly + f° O—oof 


2938 DIFFERENTIAL EQUATIONS 6.6 


Under any of conditions (1), (2), or (4), cof“f]? = 0, and the two integral terms 
are clearly bounded below by m||f‘|[2 and [{f\l2, respectively. Lemma 6.3 then 
implies that AY is a complement of the null space of T — x. C 


We come now to our main theorem. It says that the right inverse S of T — \ 
determined by the subspace M above is a compact self-adjoint mapping of the 
pre-Hilbert space ©°((a, 6j) into itself, and is therefore endowed with all the rich 
eigenvalue structures of Theorem 5.1 of the last chapter. First we present some 
classical terminology. A Sturm-Liouville system on [a, 5] is a formally self-adjoint 
second-order differential operator Tf = (¢2f’)‘ + cof defined over the closed 
interval {a, b], together with a self-adjoint boundary condition 1,(f) = 1.(f) — 0 
for that interval. If c2(é is never zero on fa, b], the system is called regular. If 
€2{@) oF ¢2{b) is zero, or if the interval [a, 6] is replaced by an infinite interval 
such as [a, oo], then the system is called singular. 


Theorem 6.1. If 7:2), fg is a regular Sturm-Liouville system on [a, 5], with 
€2 positive, then the subspace Af defined by the homogeneous boundary 
condition is a complement of N(T — i) if » is taken sufficiently large, and 
the right inverse of T — » thus determined by Af is a compact self-adjoint 
mapping of the pre-Hilbert space @°({a, b]) into itself. 


Proof. The proof depends on the inequality of the above lemma. Since we have 
proved this inequality only for boundary conditions (1), (2), and.(4), our proof 
will be complete only for those cases. 

Set g= (T—d)f. Sinee [igllellfle = |(7 —AF/)| by the Schwarz 


inequality, we see from the lemma first that || fl]? < {lyllel|flic, so that 
Ille < llelle, 
and then that miif"l} < liglellfil < llgll2, so that 
If'lle < Nglle/vm. 


We have already checked that the right inverse S of the formally self- 
adjoint T — ) defined by Af is self-adjoint, and it remains for us to show that the 
set S[U] = {(f: Iglle2 < has compact closure. For any such f the Schwarz 
inequality and the above inequality imply that 


. , 1/2 y= al 
ia) ~ 2a sf uri= fo irt-as italy — aft? < Bol”. 

x £z Vm 
Thus S{U] is uniformly equicontinuous. Since the common domain of the 
functions in S[U] is the compact set [e, b], we will be able to conclude from 
Theorem 6.1 of Chapter 4 that the set S[U] is totally bounded if we can show 
that there is a constant C such that all the functions in S[U] have their ranges in 
[—C,¢C]. Taking y and = in the last inequality where |f| assumes its maximum 
and minimum values, we have |lflle — min |f[ < (—4)'?/\/m. But 


6.6 THE BOUNDARY-VALUE PROBLEM 299 


(min |f|)(6 — a)”? < |Ifle < lglle < 1, and therefore 
WP lice <= Cc >= 1/(b —_ ave (6 _ ay 2ia/m. 


Thus S[U} is a uniformly equicontinuous set of functions mapping the com- 
pact set fa, b] into the compact set [-- C, C], and is therefore totally bounded in 
the uniform norm. Sinee C((a, 6]) is complete in the uniform norm, every sequence 
in S[U] has a subsequence uniformly converging to some fe @, and since 
fle < (6 — a)*!" If]... this subsequenee also converges to f in the two-norm. 

We have thus shown that if H is the pre-Hilbert space ©((a, 6]) under the 
standard scalar product, then the image S[U] of the unit ball U Cc H under S has 
the property that every sequence in S[U] has a subsequence converging in #. 
This is the property we actually used in proving Theorem 5.1 of Chapter 5, but 
it is not quite the definition of the compactness of S, which requires us to show 
that the closure S[U] is compact in H. However, if {%,} is any sequence in this 
closure, then we can choose {¢,} in S[U] so that ilf, — ¢,]| < 1/n. The se- 
quence {f,,} has a convergent subsequence {fp} m a8 above, and then {Eni} m 
converges to the same limit. Thus S is a compact operator. U 


Theorem 6,2. There exists an orthonormal sequence {y,} consisting entirely 
of eigenvectors of T and forming a basis for Mf. Moreover, the Fourier 
expansion of any f € Af with respect to the basis {y,} converges uniformly 
to f (as well as in the two-norm). 


Proof. By Theorem 5.1 of Chapter 5 there exist an cigenbasis for the range of S, 
which is @. Since Se, = ne, for some nonzero ry, We have (T — A)(avn) = 
gn and Ty, = ((L+ Ara)/raden. The uniformity of the series convergence 
comes out of the following general consideration. 


Lemma 6.5. Suppose that 7 is a self-adjoint operator on a pre-Hilbert 
space V and that J is compact as a mapping from V to <V, q>, where qgisa 
second norm on V that dominates the scalar product norm p (g > ep). 
Then T is compact (from p to p), and the eigenbasis expansion ¥> b,¢, of an 
element 8 in the range of T converges to 6 in both norms. 


Proof. Let U be the unit ball of V in the scalar product norm. By the hypothesis 
of the lernma, the g-closure B of T[U] is compact. B is then also p-compact, for 
any sequence in it has a g-convergent subsequence which also p-converges to the 
same limit, because p < cg. We can therefore apply the eigenbasis theorem, 

Now let a and 8 = T(«) have the Fourier series © a;y; and > b.;, and let 
T(v:) = rw Then 6; = ria, because b: = (T(a), i) = (2, Tie) = 
(a, r¢3) = ria, o;) = ra; Since the sequence of partial sums YF ay; is 
p-bounded (Bessel’s inequality), the sequence {07 bis} = {T(X7 aiv,)} is 
totally g-bounded. Any subsequence of it therefore has 2 subsubsequence 
g-converging to some element ¥ in V. Since it then p-converges to Y, ¥ must be . 
Thus every subsequence has a subsubsequence g-converging to 8, and so 
{Xi biy;} itself g-converges to @ by Lemma 4.1 of Chapter 4. 0 


300 DIFFERENTIAL EQUATIONS 6.6 
EXERCISES 


6.1 Given that T(x) = 2f"(x)+ fix) and Sf(x) = f’(x), compute To Sand So T. 

6.2 Show that the differential operators T = aD and S = 6D commute if and 
only if the functions a(x) and 6(z) are proportional. 

6.3 Show that the differential operators T = aD? and S = 6D commute if and 
only if (x) is a first-degree polynomial b(z) = ex-| d and a(x) = k(b(x))?. 

6.4 Compute the formal adjoint S of T if 

a} Th =f’, b) Tf =f", eo) f= 7", d) (7 f)(z) = xf’), 
e} (Tf)(a) = xf" (x). 

6.5 Let S and Tf be linear differential operators of orders m and x, respectively. 
What are the coefficient conditions for Se 7 to be a linear differential operator of 
order m+ n? 

6.6 Let F be the second-order linear differential operator 


(TAO) = af") + alOf'®) + eof. 


What are the conditions on its coefficient functions for ils formal adjoint to exist? 
What are these conditions for T of order n? 

6.7 Let S and T be linear differential operators of order m and », respectively, and 
suppose that all coefficienis are C*-functions (infinitely differentiable}. Prove that 
Sof — Te Sis of order < m+xn—1. 

6.8 <A 6-blip is a continuous nonnegative function ¢ such 
that » = 0 outside of [—6, 6} and f2,y = 1 (Fig. 6.4). We 
assume that there exists an infinitely differentiable 1-blip ¢. 

Show that there exists an infinitely differentiable 6-blip for 
every 6 > 0. Defne what you would mean by a 6-blip centered 
at x, and show that one exists. 


Fig. 6.4 


6.9 Let f be a continuous function on [a, 6} such that (f, 9) = J? fg = 0 whenever 
g 18 an infinitely differentiable function which vanishes near a and b, Show that f = 0. 
(Use the above exercise.) 


6.10 Let @*([a, &)) be the vector space of infinitely differentiable functions on [a, 8], 
and let 7’ be a second-order linear differential operator with coefficients in C*: 


(Tf) = a(OS"O) + asd) + aol fo. 
Let S$ be a linear operator on ©*({(a, b}) such that 
(Tf,g) - (f, 89) = KY, 9) 


is a bilinear functional depending only on the values of f,g,f’, and g’ at @ and b. 
Prove that S$ is the formal adjoint of T. (int: Take f to be a éblip centered at x. 
Then K{f,g) = 0. Now try to work the assertion to be proved into a form to which 
the above exercise can be applied.) 


6.7 FOURIER SERIES 301 


6.11 Prove an xth-order generalization of the above exercise. 

6.12 Let X be the space of linear differential operators with C*-coefficients, and let Ar 
be the formal adjoint of T. Prove that 7 -—+ Az is an isomorphism from X to X, 
Prove that Aires) = Age Ap. 


7. FOURIER SERIES 


There are not many regular Sturm-Liouville systems whose associated ortho- 
normal eigenbases have proved to be important in actual calculations. Most 
orthonormal! bases that are used, such as those due to Bessel, Legendre, Hermite, 
and Laguerre, arise from singular Sturm-Liouville systems and are therefore 
beyond the limitations we have set for this discussion. However, the most well- 
known example, Fourier series, is available to us. 

We shall consider the constant coefficient operator Tf = D*f, which is clearly 
both formally self-adjoint and regular, and either the boundary condition 
f(0) = fir) = 0 on (0, 7] (type 1) or the periodic boundary condition f(—7) = 
S(m), f'(—) = f'(x) on [—t, 7] (type 4). 

To solve the first problem, we have to find the solutions of f’” — Af = 0 
which satisfy f(0}) = f(r) = 0. If } > 0, then we know that the two-dimen- 
sional solution space is spanned by {e’*, e7'*}, where r = A1/?. But if eye"* + 
coe ** is 0 at both 0 and 7, then ¢, = cy = O (because the pairs <1,1> and 
<e™",e-™ > are independent). Therefore, there are no solutions satisfying the 
boundary conditions when \ > Q. If A = 0, then f(z) = eyz + ¢9 and again 
ec) = to = 0. 

If } <0, then the solution space is spanned by {sin rz, cos rx}, where 
y= (—d)?. Now if ¢, sinrz + ¢, cosrz is 0 at c = 0 and « = m, we get, 
first, that ¢. = 0 and, second, that rz = na for some integer ». Thus the 
eigenfunctions for the first system form the set {sin nx} 7, and the corresponding 
eigenvalues of D? are {—n?}?. 

At the end of this section we shal! prove that the functions in ©7([a, b]) 
that are zero near @ and 0 are dense in @((a, b]} in the two-norm. Assuming this, 
it follows from Theorem 2.3 of Chapter 5 that a basis for is a basis for €°, and 
we now have the following corollary of the Sturm-Liouville theorem. 


Theorem 7.1, The sequence {sin xz}7 is an orthogonal basis for the pre- 
Hilbert space ©°({0, r}). If f € ©7({0, r]) and f(0) = f(r) = 0, then the 
Fourier series for f converges uniformly to f. 

We now consider the second boundary problem. The computations are a 
little more complicated, but again if f(x) = c,e"* + cge7”, and if f(—a) = f(m) 
and f’(—7) = f’(m), then f= 0. For now we have 

eye "* + coe’™ = cye"™ + coe”, 
giving ¢) = cy, and 


eyre °" — core’™ = eyre’* — cgre”™*, 


302 DIFFERENTIAL EQUATIONS 6.7 


giving ¢,(e"" — e **) = 0, and soc; = 0. Again f(r) = c)x + co is ruled out 
Finally, if f(z) = e; snrz + ¢2 cos rz, our boundary conditions become 


2e, sin rr = O and 2reg sinr7 = 0, 


so that again r= n, but this time the full solution space of (D? + n”)f = 0 
satisfies the boundary condition. 


Theorem 7.2. The set {sin na}? U {cos nx} 9 forms an orthogonal basis for 
the pre-ILilbert space €°([—a, w]). If fe ©?([—7, wr) and f(—7) = fin), 
y’(—7) = f’(r), then the Fourier series for f converges to f uniformly on 
(—7, 7]. 


Remaining proof. This theorern follows from our general Sturm-Liouviile dis- 
cussion except for the orthogonality of sinnz and cosnz. We have 


TT 
(sin vr, cos nx) = sin né cos né dé 


T 
=}; f sin 2nt di 
1 


= —(1/4n) cos 2ne}, 
= 0. 


Or we can simply remark that the first integrand is an odd function and therefore 
its integral over any symmetric interval {—a, a] is necessarily zero. 

The orthogonality of eigenvectors having different eigenvalues follows of 
course, as in the proof of Theorem 3.1 of Chapter 5. G 


Finally, we prove the density theorem we needed above. ‘lhere are very 
slick ways of doing this, but they require more machinery than we have avail- 
able, and rather than taking the time to make the machines, we shall prove the 
theorem with our bare hands. 

It is standard notation to let a subscript zero on a symbol denoting a class 
of functions pick out these functions in the class that are zero “on the boundary” 
in some suitable sense. Here @o{fa, d]) will denote the functions in @([a, b]) that 
are zero in neighborhoods of a and 8, and similarly for €2{[a, bI). 


Theorem 7.3. ©7({ja, J) is dense in C(ja, b]) in the uniform norm, and 
@2((a, b}) is dense in C{[a, 6]) in the two-norm. 


? 


Proof. We first approximate f © C([a, b]) to within € by a piecewise “linear’ 
function g by drawing straight line segments between the adjacent points on the 
graph of f lying over a subdivision ¢@ = % < 4; <--- <2, = 6 of fa, d}. 
if f varies by less than € on each interval (#;_1, x;), then [If — gll. <« Now 
g(t) is a step function which is constant on the intervals of the above sub- 
division. We now alter g’(é) slightly near each jump in such & way that the new 
function A(é) is continuous there. If we do it as sketched in Fig. 6.5, the total 


6.7 FOURIER SERIES 303 


plz) 


Fig. 6.5 Fig. 6.6 

integral error at the jump is zero, {752 (kh — 9’) = 0, and the maximum error 
{Z4., is 64/4. This will be less than € if we take 8 = €/||g’l|x, since A < 2\[g’llx. 
We now havea continuous function h such that |? h(d) dt — (f(x) — f(a))| < 2e. 
In other words, we have approximated f uniformly by a continuously differ- 
entiable function. 

Now choose g and A in @'({a, ]} so that first I[f — gl]. < ¢/2 and then 
lg’ — hile < €/2(6 — a). Then 


lata) — (0) — fa] < €/2, 


and so H(z) = fj hk + g{0) is a twice continuously diffcrentiable function such 
that |[f — Hilo <«. In other words, ©7({a, b]) is dense in @((a, J) in the 
uniform norm. It is then also dense in the two-norm, since 


ttle = (f°)? < wale [°1)*"? = @ — 0) If. 


But now we can do something which we couldn’t do for the uniform norm: 
we can alter the approximating function to one that is zero on neighborhoods of a 
and b, and keep the two-norm approximation good. Given 4, let e(t) be a non- 
negative function on [a, b}] such that e(f) = 1 on [a + 26, bd — 28], e@) = 0 on 
fa, a+ é]and on [b — 4, d], e”’ is continuous, and ffej], = 1. Such an e(é) clearly 
exists, since we can draw it. We leave it as an interesting exercise to actually 
define e(t). Here is a hint: Show somehow that there is a fifth-degree polynomial 
p(t) having a graph between 0 and I as shown in Fig. 6.6, with a zero second 
derivative at 0 and at 1, and then use a piece of the graph, suitably translated, 
compressed, rotated, ete., to help patch together e(Z). 

Anyway, then ||y — egli2 < {gl|..{48)?/? for any g on @({a, b)), and if g has 
continuous derivatives up to order 2, then so does eg. Thus, if we start with f in € 
and approximate it by g in @*, and then approximate g by eg, we have altogether 
the second approximation of the theorem. 0 


304 DIFFERENTIAL EQUATIONS 6.7 
EXERCISES 


7.1 Convert the orthogonal basis {sin xx}? for the pre-Hilbert space ©([0, r]} to 
an orthonormal basis. 


7.2 Do the same fer the orthogonal basis {sin nz} PU {cos nz} for C([—z, a). 
7.3 Show that {sin ngty is an orthogonal basis for the vector space F of all odd 


continuous functions on [—7, 7]. (Be clever. Do not calculate from seratch.) Normal- 
ize the above basis. 


7.4 State and prove the corresponding theorem for the even functions on [—a, a). 
7.5 Prove that the derivative of an odd function is even, and conversely. 


7.6 We now want to prove the following stronger Uneorem about the uniform 
convergence of Fourier series. 


Theorem. Let f have a continuous derivative on |[—z, 7], and suppose that 
f(—a) = fir). Then the Fourier series for f converges to f uniformly. 


Assume for conyenicnee that f ts even. (This only cuts down the number of 
calculations.) Show first that the Fourier series fur f’ is the series obtuined from the 
Fourier series for f by term-by-term differentiation. Apply the above exercises here. 
Next show from the two-norm convergence of its Fourier scrics to f’ and the Schwarz 
inequality that the Fourier series for f converges uniformly. 


7.7 Prove that {cos nz) @ ts an orthonormal basis for the space Af of ©*-lunctions 
on (0, 7] such that f’(0) = f'(7) = 0. 


7.8 Find a fifth-degree polynomial p(x) such that 
pi) = pO) = p(0) = 0, p(l) = p11) = 0, pt) = 1. 


(Forget the last condition until the end.) Sketch the graph of p. 


7.9 Use a “piece” of the above polynoraial p to construct a function e(z) such that e 
and e” exist and are continuous, c(7} = Owhens < @+ dand2 > b— 6, er) = 1 
on [a4 26,6 - 268], and [fel[. = 1. 


7.10 Prove the Weierstrass theorem given below on [0, J in the following steps, We 
know that f van be uniformly approximated by a @?-funetion g. 


1) Show that ¢ and d can be found and that g() - ef) - dis 0 at 0 and x. 


2) Use the Fourier series expansion of this function and the Maclaurin series for 
the functions sin nz to show that the polynomial p(x) can be found. 


Theorem (The W'cterstrass approximation theorem). The polynomials are dense 
in @({a, b]) in the uniform norm. ‘That is, given any continuous function f on 
Ja, 6] and any ¢, there is a polynomial p such that jf(x) — p(x) < ¢ for all < 
in [a, )). 


CHAPTER 7 


MULTILINEAR FUNCTIONALS 


This chapter is principally for reference. Although most of the proofs will be 
included, the reader is not expected to study them. Our goal is a collection of 
basie theorems about alternating multilinear functionals, or exterior forms, and 
the determinant function is one of our rewards. 


1. BILINEAR FUNCTIONALS 


We have already studied various aspects of bilincar functionals. We looked at 
their duality implications in Scetion 6, Chapter 1, we considered the “canonical 
forms” of symmetric bilinear functionals and their equivalent quadratic forms 
in Section 7, Chapter 2, and, of course, the whole scalar product theory of 
Chapter 5 is the theory of a still more special kind of bilinear functional. In this 
chapter we shall restrict oursclves to bilinear and multilinear functionals over 
finite-dimensional spaces, and our concerns are purely algebraic. 

We begin with some material related to our earlier algebra. If V and W are 
finite-dimensional vector spaces, then the set of all bilinear funetionals on 
V X W is pretty clearly a vector space. We designate it V* © W* and call it 
the tensor product of V* and W*. Our first theorem simply states something 
that was implicit in Theorem 6.1 of Chapter 1. 


Theorem 1.1. The vector spaces V* & W*, Hom(V, W*), and Hom(W, V'*)} 
are naturally isomorphic. 


Proof. We saw in Theorern 6.1 of Chapter 1 that each fin V* © W* determines 
a linear mapping at f, from W to V*, where f,.(é) = f(é, a), and we also noted 
that this correspondence from V* ® W* to Hom{W, V*) is bijective. All that 
the present theorem adds is that this bijective correspondence is linear and so 
constitutes a natural isomorphism, as does the similar one from V* &® W* to 
Hom{V, W*). To see this, let fr be the bilinear functional corresponding to T in 
Hom(V, W*). Then fir4s, = fr +fs, for fires)(a, 8) = ((T + 8)(e)) (8) = 
(T(a) + S(a)) (8) = (P(a))(B) + (S(a))(8) = frla, 8) + fs(a, 8). We can 
do the same for homogeneity. 

The isomorphism of V* ® W* with Hom(W, V*) follows in exactly the 
same way by reversing the roles of the variables. We are thus finished with the 
proof. 

305 


306 MULTILINEAR FUNCTIONALS 7.2 


Before looking for bases in V* @) W*, we define a bilinear functional ¥ @ » 
for any two functionals YE V* and }\e W* by (7 & AE, a) = ¥(EA(y). 
We call Y & \ the tensor product of the functionals ¥ and \ and call any bilinear 
functional having this form elementary. It is not too hard to see that fe V* @ W* 
is elementary if and only if the corresponding T € Hom(V, W*) is a dyad. 

If V and W are finite-dimensional, with dimensions m and 7, respectively, 
then the above isomorphism of V* ® W* with Hom(V, W*) shows that the 
dimension of V* @© W* is mn. We now describe the basis determined by given 
bases in V and W. 


Theorem 1.2. Let {a;}7 and {6;}7 be any bases for V and W, and let their 
dual bases in V* and W* be {,}7 and {v;}}. Then the mn elementary 
bilinear functionals {u;® »;} form the corresponding basis for V* ® W*. 


Proof. Since pu; &) v3(& 2) = uC t)Pj(m) = ay;, the matrix expansion f(£, 7) = 
Li; &xy; becomes f(&, 9) = Dej tes (us @ v4, a) or 


f= DL tii @ v,). 
ty 
The set: {u; © »;} thus spans V* & W*, Since it contains the same number of 
elements (mn) as the dimension of V* @ W*, it forms a basis. U 


Of course, independence can also be checked directly. If}; ; t:;(¥;Q@ v2) = 
Q, then for every pair <k,l>, ta = Daj be(ui © v;)(ex, Bd) = 0. 

We should also remark that this theorem is entirely equivalent to our 
discussion of the basis for Hom(V, W) at the end of Section 4, Chapter 2. 


2. MULTILINEAR FUNCTIONALS 
All the above considerations generalize to multilinear functionals 
P:Vix%---*V¥,->R., 


We change notation, just as we do in replacing the traditional <x, y> € R* by 


X= <3),...,%7 ER" Thus we write flai,...,a.) = f(a), where 
= <ay,...,;0,7 EV, X+-+X Vy. Our requirement now is that 
flay, rate Gy) 


be a linear functional of a; when «; is held fixed for alli ~ 7. The set of all such 
functionals is a vector space, called the tensor product of the dual spaces 
Vi,..., VX, and is designated Vf & --- & V*. 

As before, there are natural isomorphims between these tensor product 
spaces and various Hom spaces. For example, 


Vi@®---@Vs and Hom(Vi, V3 ®--- & Va) 


are naturally isomorphic. Also, there are additional isomorphisms of a variety 


7.2 MULTILINEAR FUNCTIONALS 307 


not encountered in the bilinear case. However, it will not be necessary for us to 
look into these questions. 

We define elementary multilinear funetionals as before. If \; € V¥, t= 
1,...,”,and&= <£,,...,& >, then 


Oy @ Sy. &) An) (£) = Ai (Ey) rp Au lEn)- 


To keep our notation as simple as possible, and also because it is the case of 
most interest to us, we shall consider the question of bases only when V; = 
Ve=---=V, = ¥. In this case (V*)@ = V*®---@ V* (in factors) is 
called the space of covariant tensors of order 1 {over V). 

If {a;}% is a basis for V and f € (V*)®, then we can expand the value 
f() = fk), ..., &2) with respect to the basis expansions of the vectors &; just 
as we did when f was bilinear, but now the result is notationally more complex. 
If we set = D™, ria; fori = 1,...,n (so that the coordinate set of £; is 
x' = {23} ;) and use the linearity of the f(£:,..., é,} in its separate variables one 
variable at a time, we get 


1 2 
S(é, oeey En) = Llp, Xpq sie ted py Oper s- > &p,); 


where the sum is taken over all -tuples p= <p1,--., > such thatl <p; < 
m for each ¢ from 1 to m. The set of all these n-tuples is just the set of all func- 
tions from {1,..., ”} to {l,..., m}. We have designated this set 7%", using the 
notation ® = {1,..., 2}, and the scope of the above sum can thus be indicated 
in the formula as follows: 


Heys. Be) = Do tay teflapy +++ 5 Opes 
pem” 

A strict proof of this forrnula would require an induction on 2, and is left to the 
interested reader. At the inductive step he will have to rewrite a double sum 
Lpen® Lyew_as the single sum ))yem*t1 using the fact that an ordered pair 
<p,j> inm” X % is equivalent to an (n + 1)-tuplet q € ™**!, where 9; = p; 
for 7 = 1,...,n and Gn+1 = 9. 

If {4,}7 is the dual basis for V* and q & 7”, let ug be the elementary fune- 
tional po, O--+ @Oag,. Thusyglep,,..-,%p,) = [Tine,(ap,) = 0 unless p = gq, 
in which case its value is 1. More generally, 


Ha(f1,---; &2) = Hq, (E1) ee Ko, (En) = ta, iS ee 


Therefore, if we set cg = f(ag,,-- +» &,), the general expansion now appears as 


SE, «+1 be) = DO compl, ... 5 En) 
pen 
or f = }° epttp, Which is the same formula we obtained in the bilinear case, but 
with more sophisticated notation. The functionals {u,:p © 7"} thus span 
(V*)®, They are also independent. For, if 5 c,z, = 0, then for each q, 


Cy = 2 Cpp(y,, +++) &g,) = 0. We have proved the following theorem. 


308 MULTILINEAR FUNCTIONALS re 


Theorem 2.1. The set {vp:p € 7%"} is a basis for (V*)®. For any f in 
(V*)® its coordinate function {c,} is defined by cp = flap,,...,ap,)- 
Thus f = L pty and f(é1, sey En) = 2 Sptp( Ei, heey En) _ pie as ree Zp, 
for any f € (V*)@ and any <£1,...,%,> EV 


Corollary. The dimension of (V*)® is m”. 


Proof. There are m” functions in 7", so the basis {u,:p € ™"} has m” elc- 
ments. 0 


3. PERMUTATIONS 


A permutation on a set S is a bijection f: S — S. If 8(S8) is the set of all permu- 
tations on S, then $ = $(S) is closed under composition (¢,9 €$ = o°9 €$) 
and inversion (¢ € $ = ¢—! € $). Also, the identity map J is in $, and, of course, 
the composition operation is associative. Together these statements say exactly 
that $ is a group under composition. The simplest kind of permutation other 
than Z is one which interchanges a pair of elements of S and leaves every other 
element fixed. Such a permuation is called a transposition. 

We now take § to be the finite set 7 = {1,...,n} and set $, = S(%). 
It is not hard to see that then any permutation can be expressed as a product of 
transpositions, and in more than one way. 

Amore elementary fact that we shall need is that if p is a fixed element of $,, 
then the mapping go p is a bijection 8, + $,. It is surjective because 
any @’ ean be written o’ = (o’ o p—') o p, and it is injective because 7) op = 
g2°p = (9,°p)°p 1 = (o,0p)op 7! = o, =G2. Similarly, the mapping 
gt pog (p fixed) is bijective. 

We also need the fact that there are n! elements in §,. This is the ele- 
mentary count from secondary school algebra. In defining an element ¢ € §,, 
o(1) can be chosen in » ways. For each of these choices o(2) can be chosen in 
n — 1 ways, so that <¢(1),0(2)> ean be chosen in a(n — 1) ways. For each of 
these choices o(3) can be chosen in n — 2 ways, etc. Altogether o can be chosen 
in n(n — L)(n — 2)--- 1 = n! ways. 

In the sequel we shall often write ‘po’ instead of ‘p ¢ o’, just as we occasion- 
ally wrote 4ST” instead of ‘S e T” for the composition of linear maps. 


Ifeé= <,,...,&:> €V"ando € 8$,, then we can “apply ¢ to £”, or “per- 
mute the elements of <&,,...,&,> through ¢”, We mean, of course, that 
we can replace <%),..., &:> by <£sc1),.--> fecn) >, that is, we can replace £ 
by £og. 


Permuting the variables changes a functional f € (V*)® into a new such 
functional. Specifically, given f € (V*)® anda é $,,, we define #” by 


PSD = fl€eo7") = flk-1ay,-. +, e100) 


The reason for using ¢~+ instead of ¢ is, in part, that it gives us the following 
formula, 


74 THE SIGN OF A PERMUTATION 309 


Lemma 3.1, f'17? = (f%)%, 


Proof. f? (£) = f(€o (7, 002)~?) = f(Eo (7! ee7")) = f((Eeaz!) egy) = 
IE oaz?) = (7%)%2(g). OU 
Theorem 3,1, For each g in 8, the mapping 7, defined by f > f? is a linear 
isomorphism of (V*)® onto itself. The mapping o> 7, is an antihomo- 


morphism from the group $, to the group of nonsingular elements of 
Hom((V*)®), 


Proof. Permuting the variables does not alter the property of multilinearity, so 
T, maps (V*)® into itself. It is linear, since (af + bg)’ = af? + bg’. And 
Tye = T° T,, because f? = (f*)". Thuso +> T, preserves products, but in the 
reverse order. This is why it is called an antzhomomorphism. Finally, 


Piet) 0 Te = Poort) = Ty = I, 
so that T, is invertible (nonsingular, an isomorphism). 0 


The mapping ¢ + T, is a representation (really an antirepresentation) of the 
group 8, by linear transformations on (V*)®@. 


Lemma 3.2. Each T, carries the basis {u,} into itself, and so is a permu- 
tation on the basis. 


Proof. We have (up)"(£) = up($°o7*) = [E1 ap, (fw). Setting j = o 1 (@) 
and so having i = @(), this product can be rewritten [[jo1 up,,)($;) = Mpoel£). 
Thus 


(Hp)? = Kposs 


and since p> pg is a permutation on %’, we are done. 0 


4, THE SIGN OF A PERMUTATION 
We consider now the special polynomial # on R” defined by 
E(x) = E(ts,-..,20) = [I (@i— 2). 


1Si<jSr 
This is the product over all pairs <7i,j7> © 7% X % such that? < 7. This sect of 
ordered pairs is in one-to-one correspondence with the collection P of all pair 
sets {¢, 7} C# such that ¢ ¥ j, the ordered pair being obtained from the un- 
ordered pair by putting it in its natural order. Now it is clear that for any 
permutation ¢ € $,, the mapping {2,7} — {o(2),¢(j7)} is a permutation of P. 
This means that the factors in the polynomial £’(x) = E(x ea) are exactly the 
same as in the polynomial E(x) except for the changes of sign that occur when ¢ 
reverses the order of a pair. Therefore, if x is the number of these reversals, we 
have £? = (—1)"E. The mapping o +> (—1)” is designated ‘sgn’ (and called 
“sign”), Thus sgn is a function from &, to {1, —I} such that £° = (sgno)#, 


~T 


310 MULTILINEAR FUNCTIONALS 7.! 


for alle €&,. It follows that 
sgn po = (sgn p) (sgng), 


for (sen po)H = HY = (E£*)’ = (sgno)#’ = (sgn p) (sgno)#, and we can 
evaluate EZ at any n-tuple x such that E(x) = 0 and cancel the factor #(x). 
Also 

sgno = —1 if g is a transposition. 


This is clear if ¢ interchanges adjacent numbers because it then changes the 
sign of just one izetor in E(x}; we leave the general case as an exercise for the 
interested reader. 


5. THE SUBSPACE @” OF ALTERNATING TENSORS 


Definition. A covariant tensor fc (V*)® is symmetric if f? = f for all 
GE Sy. 


If f is bilinear (f € (V*)®}, this is just the condition f(é, 7) = f(n, §) for 
all &y & V. 


Definition. A covariant tensor f € (V*)® is antisymmeiric or alternating il 
f* = (sgne)f for allo € S,. 


Since each ¢ is a product of transpositions, this can also be expressed as thic 
fact that f just changes sign if two of ils arguments are interchanged. In the 
case of a bilinear functional it is the condition f(¢, 7) = —f(n, §) forall tn EY. 
It is important to note that if f is alternating, then f(£) = 0 whenever the n-tuple 
&= <£),...,& > is not injective (¢; = &; for some 7 ¥ 7). The set of all 
symmetric elements of (V*)® is clearly a subspace, as is also the (for us) more 
important set @” of all alternating elements. There is an important; linear pro- 
jection from (V*)© to @” which we now describe. 


Theorem 5.1. The mapping fr (1/n!)Yises, (sgn o)f? is a projection 2 
from (V*)®@ to @”. 


Proof. We first check that 2f € @” for every f in (V*)®. We have (f) = 
(1/n!)>", (sene)f". Now sgng = (sgnap)(senp). Setting ¢’ =oep ani 
remembering that ¢ ++ o’ is a bijection, we thus have 


(on’ = {san p) D (sgn 0°) f°" = (oan p)(24). 


Hence Qf € G”. 
If f is already in @*, then f? = (sgno)f and Qf = (1/n!)Lises, f. Since S,, 
has 7! elements, 2f = f. Thus 2 is a projection from (V*)@ to @”. O 


Lemma 5.1. Q(/?) = (sgn p) Qf. 


Proof. The formula for 0(f*) is the same as that for (Qf)? except that pe replaces 
op. The proof is thus the same as the one for the theorem above. 0 


7.5 THE SUBSPACE (@” OF ALTERNATING TENSORS 311 


Theorem 5.2. The veetor space @” of alternating n-linear functionals over 
the m-dimensional vector space V has dimension (”"). 


Proof. If fE@"” and f= LY, epep, then since fY = (sgne)f, we have 
Lp CpHpeoo = Lp (sgn o)epuy for any g in $,. Setting peg = q, the left sum 
becomes Dig Cyco—q, AN since the basis expansion is unique, we must have 
Cqoo-! = sgnocg orc, = (sgna)c,., for all pe Mm". Working backward, we see, 
conversely, that this condition implies that f? = (sgnoa)f. Thus f € @” if and 
only if its coordinate function ¢, satisfies the identity 


=—h 


Cp = (sgno)ey.e forall pem” andall ¢ €§&,. 


This has many consequences. lor one thing, ¢, = 0 unless p is one-to-one 
(injective). For if p; = p; and ¢ is the transposition interchanging 7 and 7, then 
poo = p,e, = (sgng)ey.. = —e,, and so ce, = 0. Since no p can be injective 
ifn > m, we see that in this ease the only element of @” is the zero functional. 
Thus x > m = dim @” = 0. 

Now suppose that » < m. For any injective p, the set {p°o:¢ € S,} 
consists of all the (injective) m-tuples with the same range set as p. There are 
clearly n! of them. Exactly one q = poo counts off the range set in its natural 
order, ie., satisfies gq, < go <--- < gn. We select this unique q as the repre- 
sentative of all the elements p oo having this range. The colleetion C' of these 
canonical (representative) g’s is thus in one-to-one correspondence with the 
collection of all (range) subsets of M = {1,...,m} of size x. 

Each injective p € m” is uniquely expressible as p= qoeg for some g EC, 
¢ € $y. Thus each f in @” is the sum Dgce Dees, tqeattqer: SINCE lgeg = (SEN T)fy, 
this sum can be rewritten Dyec tg Le SEN Tass = Lee tga, Where we have 
set Yq == Do (SENT) Mare = MIA(ug). 

We are just about done. Each », is alternating, since it is in the range of Q, 
and the expansion 

f= D tava 


qae¢ 


which we have just found to be valid for every f € @” shows that the set 
{vq:q €C} spans @". It is also independent, since Digec fg¥q = Lpent* tpltp 
and the set {y,} is independent. It is therefore a basis for @". 

Now the total number of injective mappings p from % = {I,..., 2} to 
m= {1,...,m} is m(m — 1)-+- (m — n+ 1), for the first element can be 
chosen in m ways, the second in m — 1 ways, and so on down through x choices, 
the last element having m — (n — 1) = m — n+ 1 possibilities. We have seen 
above that the number of these p’s with a given range is n!. Therefore, the 
number of different range sets is 


— ! 
7 | 2 (). 


And this is the number of elements qg EC. 0 


312 MULTILINEAR FUNCTIONALS 7.6 
The case n = m is very important. Now C contains only one element, the 
identity J in $,,, so that 
f= ew: = er by (sgn o) jt. 
¢ 


and 


fis, 2s Og Em) Cy oF (sgn F) foci) (£1) pat Bermy(Em) 


cy >a (sgn )ze¢1) 7+ = Reims 
g 


i 


This is essentially the formula for the determinant, as we shall see. 


6. THE DETERMINANT 

We saw in Section 5 that the dimension of the space a” of alternating m-forms 
over an m-dimensional V is (7) = 1. Thus, to within scalar multiples there is only 
one aliernating m-linear functional D over V = R™, and we ean adjust the 
constant so that D(é,..., 8") = 1. This uniquely determined m-form is the 
determinant functional, and its value D(x!,...,x™) at the m-tuple <x!,...,x"> 
is the determinant of the matrizx = {2;;} whose jth column is x’ forj = 1,..., m. 


Lemma 6.1}. Dit), ...,t”) = Does, (San oocty.1 ++ + bony ym 


Proof. This is just the last remark of the last section, with the constant cy = I, 
since D(é?,..., 8") = 1, and with the notation changed to the usual matrix 
form #;;. J 

Corollary 1. D(t*) = Dit). 
Proof. If we reorder the factors of the product ¢,,.1° ++ ¢s,,.m in the order of the 
values ¢;, the product becomes ¢1,5, ++ * fm.p,» Where p = o—'. Since 


grrp=—o! 
is a bijection from $, to Sp, and since sgn(@—’) = sgna, the sum in the lemma 
can be rewritten as Dipes,, (sen p) t1,5,°** mp, But this is 


Dd (sgn pS, yt tot = Dit*). g 


Coroltary 2. D(t) is an alternating m-lmear functional of the rows of t. 


Now let dim V = m, and let f be any nonzero alternating m-form on V. For 
any T in Hom V the functional fr defined by fr(i1,..., 2) = S(T é1,..., TEx) 
also belongs to @”. Since @” is one-dimensional, fp = kf for some constant ky. 
Moreover, kr is independent of f, since if gr = kr’g and g = ef, we must have 
cfr = ky’ef and ky’ = kp. This unique constant is called the determinant of T; 
we shall designate it A(T). Note that A(T’) is defined independently of any basis 
for V. 


Theorem 6,1, A(S o T) = A(S) A(T). 


7.6 THE DETERMINANT 313 


Proof 


A(So T)f(€1,..., Em) = f((S ¢ T)(41), ..., (Se T)(Em)) 

= f(S(T(41)), -.. SCPCEm))) 

= A(S)f(P(E1), --- Tm) = A(T) ACS) f(é1, » .- , Em)- 
Now divide out f. 0 


Theorem 6.2. If 6 is an isomorphism from V to W, and if T € Hom V and 
S = 90 To @—', then A(S) = A(T). 


Proof. If f is any nonzero alternating m-form on W, and if we define g by 
(é1,..., Ea} = f(@£,,..., #£,). Then g isa nonzero alternating m-form on ¥. 

Now IS © @£1,... , So 8f,) = ACS) f( 66, .2 +, Oba) = AGS) g@( 41, ce sin bay 
and also f(S o #£),..., 50 @&,) = f(@oTk),...,0° TE) = o(Thy,..., TE) = 
A(T)g(é1,..., &2). Thus A(S)g = A(T)g and AGS) = A(T). O 


The reader will expect the two notions of determinant we have introduced 
to agree; we prove this now. 


Corollary 1. If t is the matrix of 7 with respect to some basis in V, then 
D(t) = ACP). 


Proof. If @ is the ecordinate isomorphism, then TF = @o To 67! is in Hom R™ 
and A(T) = ACP) by the theorem. Also, the columns of t are the m-tuple 
Ts}),..., 76"). Thus Dit) = Dit!,..., 7) = D(T(65,..., 78) = 
A(T) D(é',..., &) = A(T}. Altogether we have D(t} = A(T). G 


Corollary 2. If s and t arc m Xm matrices, then Dis-t} = D(s) Dit). 
Proof. D(s-t) = 4(S° T) = A(S) A(T) = D(s) Dit). O 
Corollary 3. D(t) = 0 if and only if t is singular. 


Proof. If t is nonsingular, then t~! exists and D(t) Dit7!) = Datta) = 
DZ) = 1. In particular, D(t) + 0. If t is singular, some column, say t,, is a 
linear combination of the others, t; = DJ e,t; and Dity,..., tx) = 
DP ¢:D(t,, to,..., te) = O, since each term in the sum evaluates D at an 
ui-tuple having two identical elements, and so is 0 by the alternating property. 0 


We still have to show that A has all the properties we ascribed to it in 
Chapter 2. Some of them are in hand. We know that A(S o PT) = A(S) A(T), 
and the one- and two-dimensional properties are trivial. Thus, if T interchanges 
independent vectors a, and a, in a two-dimensional space, then its matrix with 
respect to them as a basis is t = [? 3], and so A(?) = Dit) = —1. 


The following lemma will complete the job. 


Lemma 6.2. Consider D(t) = D(t',..., t”) under the special assumption 
that t” = 6”. If s is the (a — 1) X (m — 1) matrix obtained from the 
m X m matrix t by deleting its last row and last column, then D{s) = D(t). 


314 MULTILINEAR FUNCTIONALS 7.6 


Proof. This can be made to follow from an inspection of the formula of Lemma 
6.t, but we shall argue directly. 

If t has 6” also as its jth column for some 7 ¥ m, then of course D(t) = 0 
by the alternating property. This means that D(t) is unchanged if the 7th colum 
is altered in the mth place, and therefore D(t) depends only on the values é;; in 
the rows 7 ~ m. That is, D(t) depends only on s. Now t + s is clearly a sur- 
jective mapping to R®—'**—T and, as a funetion of s, D(t) is alternating 
(m — 1)-lmear. It therefore is a constant multiple of the determinant 1) 
on R'i™—PX™—D_ To. see what the constant is, we evaluate at 


eae iy 
Then D(s)} = 1 = D(t) for this special choice, and so D(s) = D(t) in general. || 


In order to get a hold on the remaining two properties, we consider an 
mx m matrix t whose last m — n columns are 8"*1,..., 6", and we apply 
the above lemma repeatedly. We have, first, D(t) = D{(t)”"), where (t)”™ 
is the (m — 1) X (m — 1) matrix obtained from t by deleting the last row and 
the last column. Since this matrix has 6"~! as its last column (677! being now 
an (m — 1)-tuple), the same argument shows that its determinant is the sate 
as that of the (m — 2) x (m — 2) matrix obtained from it in the same way. 
We can keep on going as long as the 5-columns last, and thus see that D(t) is the 
determinant of the»  » matrix that is the upper left corner of t. If we interpret 
this in terms of transformations, we have the following lemma. 


Lemma 6.3. Suppose that V is m-dimensional and that fT in Hom V is 
the identity on an (mm — )-dimensional subspace X. Let Y be a comple- 
ment of X, and let p be the projection on Y along X. Then po (T [ Y) can 
be considered an element of Hom ¥ and A(T) = Ay(pe (T f{ Y)). 


Proof. Let a1,.-.,a, be a basis for Y, and let a,41,..., @m be & basis for NX. 
Then {a;}T is a basis for V, and since T(a,;) = a; for? = ~-|-1,..., m, the 
matrix for 7 has 6 as its 7th column for 2 = n-;-1,...,. The lemma will 
therefore follow from our above discussion if we can show that the matrix of 
po(T | Y) in Hom F is the n X » upper left corner of t. The student should 
be able to sce this if he visualizes what vector (p o T)(a,) is for? <n. O 


Corollary. In the above situation if Y is also invariant under 7’, then 
A(T) = Ay(T [ Y). 


Proof. The proof follows immediately since now po (fT [ ¥Y) = T | Y. tl 


If the roles of X and ¥ are interchanged, both being invariant under 7 and 
T being the identity on Y, then this same lemma tells us that A(T) = Ox(T' |X). 
If we only know that X and Y are T-invariant, then we can factor 7 into u 
commuting product T = T,° T, = Teo Ti, where 7 and T.2 are of the two 
more special types discussed above, and so have the rule A(T) = A(7Ty) A(T2) = 
Ax(f | X) Ay(7 } Y), another of our properties listed in Chapter 2. 


7.6 THE DETERMINANT 315 


The final rule is also a consequence of the above lemma. If 7 is the identity 
on X and also on V/X, then it isn’t too hard to see that po (f { Y) ig the 
identity, as an element of Hom Y, and so A(7') = 1 by the lemma. 

We now prove the theorem concerning “expansion. by minors (or cofactors)”. 
Let t be an m * m matrix, and let (t)?’ be the (m — 1) X (m — 1) submatrix 
obtained from t by deleting the pth row and rth column. Then, 


Theorem 6.3. D(t) = 57, (—1)'t’t,, D({t)"). That is, we can evaluate 
D(t) by going down the rth column, multiplying each element by the 
determinant of the (m— 1) X (m — 1) matrix associated with it, and 
adding. The two occurrences of ‘D’ in the theorem are of course over di- 
mensions m and m — 1, respectively. 


Proof. Consider D(t) = D(t!,...,t") under the special assumption that 
t’ = 6°. Since D(t) is an alternating linear functional both of the columns of t 
and of the rows of t, we can move the rth column and pth row to the right- 
bottom border, and apply Lemma 6.2. Thus 


Dit) = (-- 1)" "(-D”? Dt”) = (- LPP De”), 


assuming that the rth column of t is 6”. In general, t” = 0%, é, 8', and if we 

expand D(t',..., t™) with respect to this sum in the rth place, and if we usc the 

above evaluation of the separate terms of the resulting sum, we get D(t) = 
m1 ¢C-1D't ab, Di)*. C 


Corollary 1. If s * r, then 0%, (—1)**t,, D((t)*) = 0. 


Proof. Wenow have the expansion of the theorem fora matrix with identical sth 
and rth columns, and the determinant of this matrix is zero by the alternating 
property. 0 


For simpler notation, set c;; = (--1)*t? D{(t)}. This is called the cofactor 
of the clement ¢,;; in t. Our two results together say that 


m 
+ Cirtig = é; D(t). 
i=] 


In particular, if D(1) * 0, then the matrix s whose entries are s,; = ¢;,/D(t) is 
the inverse of t. This observation gives us a neat way to express the solution of 
a system of linear equations. We want to solve t- x = y for x in terms of y, 
supposing that D(t) =< 0. Since s is the inverse of t, we have x = s- y. That is, 
ay = Dik, Sey = (LA wes) /DA) for j= 1,...,m. According to our 
expansion theorem, the numerator in this expression is exactly the determinant 
d; of the matrix obtained from t by replacing its jth colurnn by the m-tuple y. 
Henee, with d; defined this way, the solution to t- x = y is the m-tuple 


wt ef om NS, 
= \ Dit) DB 


This is Cramer’s rule. It was stated in slightly different notation in Seetion 2.5. 


316 MULTILINEAR FUNCTIONALS 7.7 


7. THE EXTERIOR ALGEBRA 


Our final job is to introduce a multiplication operation between alternating 
n-linear functionals (also now called exterior n-forms). We first extend the tensor 
product operation that we have used to fashion elementary covariant tensors out. 
of functionals, 


Definition. If f E (V*)® and g € (V*)®, then f ®q is that element of 
(V*)@4D defined as follows: 


S® olf, heey Enid = fii, seg Englinais ees Ena). 


We naturally ask how this operation combines with the projection 2 of 
(V*)EEP onto at? 


Theorem 7.1. 2(f © g) = 2(f © Og) = Q(OF & 9). 
Proof. We have 


QF @ Mg) = wm De (sgn o)(f © 29) 
~ (r- + EDIS Ti de (SBN 2) (ai > (sgn pa) 
= GEDT 2, (sgn o) (sen p)(f @ 9)". 


We can regard p as acting on the full x + 7 places of f ® y by taking it as the 
identity on the first ~ places. Then (f @ 9°)” = (f @ g)?’. Set po =o’. For 
each o” there are exactly J! pairs <p,o> with po =o’, namely, the pairs 
{<p,p la’ >: pe ee Thus the above sum is 


we - ea pi oe SENOS @ ay" = AF @ o). 


The proof for 2(2f & g) is essentially the same. 0 
Definition. If fe @" andg € @!, thenf Ag = (TAY Og). 


Lemma7.l. fi A fo Av-- A fe = (nl/nylng!--- me) © --- © Se), 
where 7, is the order of f;,7 = 1,...,k, andz = st Ny. 


Proof. This is simply an induction, using the definition of the wedge operation A 
and the above theorem. 0 


Corollary. If}; € V*,7 = 1,...,2, then 
Ar Aves A dkg = niQ(rAy @-- - © An): 
In particular, if g) <--> < gp and {u,}T is a basis for V*, then 


Ma. A---* A Bg, = 21Q(ug) = the basis element vg of @”. 


7.7 THE EXTERIOR ALGEBRA 317 


Theorem 7.2. If fe @" and gE @’, then g A f= (—1)'*f A g. In par- 
ticular, \ A A = 0 ford Ee V*. 


Proof. We have g © f = (f & 9)’, where ¢ is the permutation moving each of 
the last f places over each of the first x places. Thus ¢ is the product of in 
transpositions, (sgno) = {—1)'", and 


2g Sf) = LF Gg)’ = (sgn e)a(f @ 9) = (— LaF @ 9). 
We multiply by (*75 and have the theorem. 0 


Corollary, If (437 C V*, then 4] A--: A An = 0 if and only if the 
sequence {\,;}7 is dependent. 


Proof. If Qu} is independent, it can be extended to a basis for V*, and then 
ht A+-- A Ag is some basis vector », of @” by the above corollary. In par- 
ticular, 4; A--- A Aq * 0. 

If {A,} is dependent, then one of its elements, say \;, is a linear combination of 
the rest, 4. = Dp cay and Ay A AZ A+++ A dn = Sofen cre A (An Ave: A Xn)- 
The 7th of these terms repeats d,;, and so is 0 by the lemma and the above 
corollary. 0 


Lemma 7.2. The mapping <f,g> tof A g is a bilincar mapping from 
a” x a to arte 


Proof. This follows at once from the obvious bilinearity of f © g. U 
We conclude with an important extension theorem. 


Theorem 7.3. Let # be the alternating 2-linear map 
Oh i NS hy Ae Ra 


from (V*)" to@”. Then for any alternating n-linear functional F(Aq, ..., An) 
on (V*)}*, there is a uniquely determined dinear functional G on @” such that 
F= Ge, The mapping Gr F is thus a canonical isomorphism from 
(@")* to a*(V*). 


Proof. The straightforward way to prove this is to define G@ by establishing its 
necessary values on a basis, using the equation F = Go 6, and then to show 
from the linearity of G, the alternating multilinearity of 


Os EROS eS ORY eee ee 


and the alternating multilinearity of F that the identity F = Ge @ holds 
everywhere. This computation becomes notationally complex. Instead, we shall 
be devious. We shall see that by proving more than the theorem asserts we get 
a shorter proof of the theorem. 

We consider the space @"(V*) of all alternating n-linear functions on (V*)*, 
We know from Theorem 5.2 that d(@"(V*)) = (%), since d(V*) = d(V) = m. 


318 MULTILINEAR FUNCTIONALS 7.7 


Now for each functional G in (@*)*, the functional G o @ is alternating and »- 
linear, and so Gro F = Go # is a mapping from (@%)* to @"{¥*) which ts 
clearly linear. Moreover it is injective, for if G # 0, then F(itgcy, - ~~ 5 Maca) = 
G(v4) * 0 for some basis vector vy = fgay Avs: A Man of @"(P*). Since 
d(e"(V*)) = (@) = d(@”) = d((@")*), the mapping is an isomorphism (by 
the corollary of Theorem 2.4, Chapter 2). In particular, every P in @"{(V*) is 
of the form Geo 6. U 


It can be shown further that the property asserted in the above theorem is 
an abstract characterization of @". By this we mean the following. Suppose thal. 
a vector space X and an alternating mapping » from (V*)” to X are given, and 
suppose that every alternating functional # on (V*)" extends uniquely to a 
linear functional G on X (that is, # = Gog). Then X is isomorphic to @”, anil 
in such a way that ¢ becomes @. 

To see this we simply note that the hypothesis of unique extensibility is 
exactly the hypothesis that 6: Gt+ F = Go » is an isomorphism from X* to 
a”"(V*). The theorem gave an isomorphism © from (@")* to @"(V*), and the 
adjoint (@—! o @)* is thus an isomorphism from X** to (@*)**, that is, from X 
to @”. We won’t check that ¢ “becomes” 6. 

By virtue of Corollary 1 of Theorem 6.2, the identity D(t} = D(t*) is the 
matrix form of the more general identity A(T) = A(7*), and it is interesting to 
note the “coordinate free” proof of this equation. Here, of eourse, 7 € Hom I. 

We first note that the identity (7*A)(¢) — A(T) carries through the defini- 
tions of & and A to give 


TAL As? A TP da(€1,. +) Ge) = Ad Ate A An(TE1,-. 2, Fen). (*) 


Also, ev,: Ay Acts A dn tay Acts A An(é1,--- G2) is an alternating 
n-lincar functional on @,(V*) for cach €& V". The left member of (*) is thus 
ev,(7*\1,..., T*An), and, if n = dim V, this is A(T*)ev,(\y,.--, An) by the 
definition of A. By the same definition the right side of (+) beeomes 


ACP), Aes Anti, eR én) = A(Pev.Qu, saery An). 


Thus (*} implies the identity A(T*)ev, = A(T )ev, Since ev, = 0 if § = 
{£,}4 is independent, we have proved that A(T*) = A(T). 

We call a wedge product 4; A --+ A xq of functionals A; G V* a multi- 
vector. We saw above that A; A --- A A, # 0 if and only if {A,}7 is indc- 
pendent, in which case {A;}) spans an n-dimensional subspace of V*. The 
following lemma shows that this geometric connection is not accidental. 


Lemma 7.3. Two independent n-tuples {\}7 and {u;}% in V* have the 
same linear span if and only if wy A--- A Mn = k(AY A+: A An} for 
some k. 


Proof. If {u,;}} C £({\}4), then each y; is a linear combination of the d,’s, and 
if we expand uw, A+++ A #, according to these basis expansions, we get 
k(Ay A--: A Ay). If, furthermore, {u,;}7 is independent, then & cannot be zero. 


7.8 EXTRBRIOR POWERS OF SCALAR PRODUCT SPACES 319 


Now suppose, conversely, that 4) A+++ A un = kr A+--+ A An), where 
& x£ 0. This implies first that {y;}7 is independent, and then that 


by A Or Avs: A Ag) = 0 


for each 7, so that each yw; is dependent on {A;}7{. Together, these two con- 
sequences imply that the set {u;}7 has the same linear span as {A,}7. J 


This lemma shows that a multivector has a relationship to the subspace it 
determines like that of a single vector to its span. 


8. EXTERIOR POWERS OF SCALAR PRODUCT SPACES 


Let V be a finite-dimensional vector space, and let ( , ) be a nondegenerate 
(nonsingular) symmetric bilinear form on V. In this and the next section we shall 
eall any such bilinear form a scalar product, even though it may not be 
positive definite. We know that the bilinear form { , ) induces an isomorphism 
of V with V* sending y € V into 9 € V*, where (xz) = (2, F) = (a, y) for all 
x EV. We then get a nondegenerate form (sealar product), which we shall con- 
tinue to denote by ( , ), on V* by setting (%, 3) = (u,v). We also obtain a 
nondegenerate scalar product on @% by setting 


(% A«++ A %y, 3, Aves A By) = det ((%,, 5,)). (8.1) 


To check that (8.1) makes sense, we first remark that for fixed 31,...,8 € V*, 
the right-hand side of (8.1) is an antisymmetric multilinear function of the 
vectors %,..., %,, and therefore extends to a linear function on @7(V*) by 
Theorem 7.3. Similarly, holding the @’s fixed determines a linear function on 
a*(V*), and (8.1) is well defined and extends to a bilinear function on @%{V*). 
The right-hand side of (8.1) is clearly symmetric in u and 2, so that the bilinear 
form we get is indeed symmetric. To see that it is nondegenerate, let us choose a 
basis t,,-.-, U%, so that 


(uy, Us) = ab by;. (8.2) 
(We can always find such a basis by Theorem 7.1 of Chapter 2.) We know that 
{ij} = {%i, A+++ A %;,} forms a basis for @’, wherei = <1),..., 72> ranges 
over all g-tuplets of integers such that 1 <7) <--- < #4, < n, and we claim 
that 

(Hj, %) = +5. (8.3) 


In fact, if i =~ j, then 7, ¥ }, for some value of r between 1 and g and for all s. 
In this case one whole row of the matrix (2;, %;,,) vanishes, namely, the rth row. 
Thus (8.1) gives zero in this ease. If i = j, then (8.2) says that the matrix has 
+1 down the diagonal and zeros elsewhere, establishing (8.3), and thus the fact 
that ( , ) is nondegenerate on @?, In particular, we have 


(244 Aree A Un, Uy Arena Un) = (-1)*, (8.4) 


where # is the number of minus signs occurring in (8.3), 


320 MULTILINEAR FUNCTIONALS 7.9 


9. THE STAR OPERATOR 


Let. V be a finite-dimensional veetor space endowed with a nondegenerate scalar 
product as in Section 8. The space @” is one-dimensional if 7 is the dimension of 
VY. The induced scalar product on @” is nondegenerate, so that (@, %) is either 
always positive or always negative for all nonzero % € @". In particular, there 
are exactly two @’s in @” with (@, %@) = +1. Let us choose one of them and hold 
it fixed for the remainder of this section. Geometrically, this amounts to choosing 
an orientation on V. We thus have picked a 


Zea" ~~ with (i, Z) = (—1)*, (3. 
Let 5 be some fixed element of @®. Then for any 7 € @""%,5 A 7 <Q”, and so 
we can write 3 A F = fi(y)u, where f,(9) depends linearly on % Since the 
induced sealar product { , ) on @”~? is nondegenerate, there is a unique 
element *3 < @"~* such that (%, #3) = (7). To repeat, we have assigned a 
*B CC @"2 to each BE @? by setting 

(9, *5)U = 7 A F. (9.2) 
We have thus defined a map, *, from @% to @”~*. It is clear from (9.2) that this 
map is linear. Let w,,...,%, be a basis for V satisfying (8.2) and also % = 
uy Acs+ A ta, and construct the corresponding bases for the spaces @% and 
e"~%, ‘Then % A @ = 0 if any 7; oceurring in the q-tuplet i also occurs in j. 
If no 2; oceurs in j then 
ThA Tp Gy ighpendgrg 
where €, = sgn k. 
If we compare this with (9.2) and (8.3), we see that 


#Uy = EEG. id pig gh (9.3) 


where the sign is the same as that occurring in (8.3), i-e., the sign is positive or 
negative according as the number of j; with (@;,, %;,) = —1 which appear in j 
is even or odd. Applying * to (9.3), we see that 


*4p = (—1)U7-F ty forall ve a’. (9.4) 
Let v and w be elements of @%. Then 
(xy, ewe =v A ew = (—1)9 Paw Av = (1)? (ew, vu. 
If we apply (9.4), we see that 
(xv, ew) = (—1)* (x, w). (9.5) 


CHAPTER 8 


INTEGRATION 


1. INTRODUCTION 


In this chapter we shall present a theory of integration in n-dimensional Euclid- 
ean space E", which the reader will remember is simply Cartesian n-space 
R” together with the standard scalar product. Our main item of business is to 
introduce a notion of size for subsets of E” (area in two dimensions, volume in 
three... .). Before proceeding to the formal definitions, let us see what properties 
we would like our notion of size to have. We are looking for a function » which 
assigns 2 number »u(A) to bounded subsets A Cc E”. 


i) We would like u(A) to be a nonnegative real number. 
ii) If 4c B, we would expect to have p(A) < p(B). 


iii) If A and B are disjoint (that is, A MN B = @), then we would expect to 
have u(A UB) = plA) + w(B). 


iv) Let T be any Euclidean motion.* For any set A let 7A be the set of all 
points of the form 7x, where x © A. We then would expect to have 
u(TA) = p(A). (Thus we want “congruent” sets to have the same size.) 


¥) We would expect a “lower-dimensional set” (where this is suitably de- 
fined) to have zero size. Thus points in the line, curves in the plane, 
surfaces in three-space, etc., should all have zero size. 


vi) By the same token, we would expcet open sets to have positive size. 


In the above diseussion we did not specify what kind of sets we were talking 
about. One might be ambitious and try to assign a size to every subset of E”. 
This proves to be impossible, however, for the following reason: Let U’ and V’ 
be any two bounded open subsets of £3. It can be shownf that we can find 


* Recall that a Euclidean motion is an isometry of E” and can thus be represented as 
the composition of the translation and an orthogonal transformation. 

+S. Banach and A. Tarski, Sur la décomposition des ensembles de pointes en partic 
respectivement congruentes, Fund. Alath. 6, 244-277 (1924). R. M. Robinson, Ou the 
decomposition of spheres, Fund. Afath. 34, 246-260 (1947). 


321 


322 INTEGRATION 8.2 


decompositions 


k 
u=Uuv; and V=UY?; 
i=l tm 

with U;NU;= @ = Vin V; for 7 ¥ 7, and Euclidean motions T; with 
7T;U; = V;. In other words, we can break up U into finitely many pieces, move 
these pieces around, and then recombine them to get V. Needless to say, the 
sets U7, will have to look very bad. A moment’s reflection shows that if we wish 
to assign a size to ali subsets (inckuding those like U;), we cannot satisfy (ii), 
(iii), (iv), and (vi). In fact, (iii) [repeated (4 — 1} times] implies that 


& 
w(U) = a wT), 


and (iv) implies that 2(U,) = x(V.). Thus p(U) = u(V), or the size of any two 
open sets would coincide. Since any open set contains two disjoint open sets, 
this implies, by (ii), that u(V) > 2n(V), so n(U) = 0. 

We are thus faced with a choice. Ejther we dispense with some of require- 
ments (i} through (vi) above, or we do not assign a size to every subset of E”. 
Since our requirements are reasonable, we prefer the second alternative. This 
means, of course, that now, in addition to introducing a notion of sizc, we must 
deseribe the class of “good” sets we wish to admit. 

We shall proceed axiomatically, listing some “reasonable” axioms for a class 
of subsets and a function pu. 


2. AXIOMS 


Our axioms will concern a class D of subsets of E” and a function y defined on D. 
(That is, w(A) is defined if A is a subsct of E* belonging to our collection 0.) 


I. Disa collection of subscts of E* such that: 
Ql. If AcDand BED, thn AUBED, ANBED andA—BED, 
m2. If Ae Dand T isa translation, then TA € 0. 
3. The set (p= {x:0 < x' < 1} belongs to D. 
I. The real-valued function » has the following properties: 
el. (A) > O forall A ED. 
p2. If AED, BED, and ANB = @, thenp(A UB) = p(A) +. w(B). 
ps. For any A €Dand any translation 7, we have «(7 A) = u(A). 
wd. w(C)) = 1. 


Before proceeding, some remarks about our axioms are in order. Axiom D1 
will allow us to perform elementary set-theoretical operations with the elements 
of D. Note that in Axioms 2 and 43 we are only allowing translations, but in 


ad 
Hae | Bae 


i 


8.2 AXIOMS 323 


our list of desired properties we wanted proper behavior with respect to all 
Euclidean motions in (iv), The reason for this is that we shall show that for 
“good” choices of D, the axioms, as they stand, uniquely determine u. It will 
then turn out that actually satisfies the stronger condition Gv), while we 
assume the weaker condition 43 as an axiom. 


Fig. 8.1 


Axiom 03 guarantees that our theory is not completely trivial, i.e, the 
collection D is not empty. Axiom y4 has the effect of normalizing uw. Without it, 
any yw satisfying wl, 42, and 43 could be multiplied by any nonnegative real 
mumber, and the new function p’ so obtained would still satisfy our axioms. 
In particular, u4 guarantees that we do not choose y« to be the trivial function 
assigning to each A the value zero. 


Fig. 8.2 


Our program for the next few sections is to make some reasonable choices for 
D and to show that for the given D there exists a unique» satisfying pl through ,4. 

An important elementary consequence of the ©, p-axioms that we shall fre- 
quently use without comment is: 


u5. If A C Ut Az and all the sets are in D, then p(A) < DY (A). 


Our beginning work will be largely combinational. We will first consider 
(generalized) rectangles, which are just Cartesian products of intervals, and the 
way in which a point inside a rectangle determines a splitting of the rectangle 
into a collection of smaller rectangles, as indicated in Fig. 8.1. This is associated 
with the fact that the intersection of any two rectangles is a rectangle and the 
difference of two rectangles is a finite disjoint union of rectangles (see Fig. 8.2). 


324 INTEGRATION 8.3 


Fig. 8.3 


We call a set A paved if it can be expressed as the union of a finite disjoint 
collection p of rectangles (a paving of A). It will follow from our combinational 
considerations that the collection Dmin of all the paved sets satisfies Axioms D1 
through 93 and is the smallest family that does: any other collection D satisfying 
the axioms includes Din. It will then follow that if » satisfies x1 through #4 on 
Dmin, then it must have the natural value (the product of the lengths of the sides) 
for a rectangle. This implies that u is uniquely defined on Dain by requirements 
#1 through y4, since the value (A) for any paved set A must be the sum of the 
natural values for the rectangles in a paving of A. The existence of u on Dmiy 
thus depends on the crucial lemma that two different pavings of the sel A give 
the same sum. (See Fig. 8.3.) 

This comes down to the fact that the “intersection” of two pavings of A is 
a third paving “finer” than either, and the fact that when a single rectangle is 
broken up, the natural values of » for the pieces add up to uw for the fragmented 
rectangle. 

All these considerations are elementary bul exceedingly messy in detail. We 
give the proofs below for the reader to refer to in case of doubl, but he may 
prefer to study only the definitions and statements of resul(s and then to proceed 
to Section 6. 


3. RECTANGLES AND PAVED SETS 


We first introduce some notation and terminology. Let a= <a',...,a°> 
and b= <b!,...,6"> be elements of E”. By the rectangle {_? we shall mean 
the set of allx = <z',...,2"> in E* with a < 2* < b’ Thus 


CP = {x:a <2! < bi =1,...,n}. (3.1) 


Note that in order for [~ to be nonempty, we must have a’ < B for all 2. 
In other words, ; . 
Ce=@ if ai > Bb forsomei. (3.2) 


In the plane (n = 2) for instance, our rectangles [_]> correspond to ordinary 
Euclidean rectangles whose sides are parallel to the axes. (We should perhaps use 
an additional adjective and call our sets level rectangles, braced rectangles, or 
something else, but for simplicity we shall just call them rectangles.) Note that 
in the plane our rectangles include the left-hand and lower edges but not the 
right-hand and upper ones (see Fig. 8.4). 


8.3 RECTANGLES AND PAVED SETS 325 


x,=0 


2 =0 Fig. 3.4 


For general n, if we set l = <1,...,1>, then our notation coincides with 
that of D3. 

We now collect some elementary facts about rectangles. It follows imme- 
diately from the definition (3.1) thatifa = <a',...,a*>, b= <b',...,5">, 


etc., then 
Ginie= Gs (3.3) 
where 
e' = max (a’, e') and sf? = min (8', d"), eo eee 3 


(The reader should draw various different instances of this equation in the plane 
to get the correct geometrical feeling.) Note that the case where(_2 n Lt = @ 
is included in (3.3) by (3.2). Another immediate consequence of the definition 
(3.1) is 
Top = TP for any translation T. (3.4) 

We will now establish some elementary results which will imply that any D 
satisfying Axioms D1 through D3 must contain all rectangles. 


Lemma 3.1. Any rectangle []? can be written as the disjoint union 


k 
re = U : where b, — a, © CM. 
r=1 
(What this says is thal any “big” rectangle can be written as a finite union 
of “small” ones.) 


Proof. We may assume that [> ~* @ (otherwise take & = 0 in the union). 
Thus 8' > a’. In particular, if we choose the integer m sufficiently large, 
(1/2™)(b — a) will lie in [}. 

By induction, it therefore suffices to prove that we can decompose [_]? into 
the disjoint union 


on 
C2= U0 with ds—es = Hb—a). (3.5) 
s—1 


(For then we can continue to subdivide until the rectangles we get are small 
enough.) 

We get this subdivision in the obvious way by choosing the vertex “in the 
middle” of the rectangle and considering all rectangles obtained by cutting 
(J through this point by coordinate hyperplanes. To write down an explicit 


326 INTEGRATION 8.3 


an2; =beg 
bios 


/ 11,2} 


Fig. 8.5 


formula, it will be convenient to use the set of all subsets of {1,..., 2} as an 
indexing set, rather than the integers 1,..., 2". Let J denote an arbitrary sub- 
set of {1,2,...,n}. Let ay = <al,...,a}> and by = <bd},...,3> be 
given by 


,, ¥-a b if ted, 
a= + 3 ii zed, al w= fs — 


a if ied eg Ree 


Then any x © [? lies in one and only one cole: In other words, ms! nN Cie = 
@itJ ~ Kand Uans LIM = Lb. (The case where xn = 2 is shown in Fig. 8.5.) 
Since by — ay = 4(b — a) for all J, we have proved the lemma. H 


We now observe that for any e € (1 we have, by (3.3), 
Os = 06 9 Ce-1. (3.6) 


Let 7, denote translation through the vector »v. Then (%_, = T.1[( by 
(3.4). Thus by Axioms D2 and 08 the rectangle [_, must belong to 9. 
By (3.6) and Axiom D1 we conclude that (§ € D for any ¢ € [p. 

Observe that 7,00" = (2 by (3.4). Thus 


CRED whenever b ~a c(h. 
If we now apply Lemma 3.1 we conclude that 
Cee forall aandb. (3.7) 
We make the following definition. 


Definition 3.1. A subset S C E” will be called a paved set if S is the disjoint 
union of finitely many rectangles. 


We can then assert: 


Proposition 3.1, Any © satisfying Axioms D1 through D3 must contain all 
paved sets. Let Duin denote the collection of all finite unions of rectangles; 
then D,,in satisfies Axioms DI through D3. 


Proof. We have already proved the first part of this proposition. We leave the 
second part as an cxercise for the reader. 


8.4 THE MINIMAL THEORY 327 


4. THE MINIMAL THEORY 


We are now going to see how far » is determined by Axioms x1 through 3. 
In fact, we are going to show the u({(_}®) is what it should be; i.e., if 


a= <q',...,a"> and b= <b',...,b">, 


then we must have é 
hy __ 0 if Ole = B, 
as) = (os Sabie. Sie ee (4.1) 


Axiom y4 says that (4.1) holds for the special case a = 0,b = 1. Examining 
the proof of Lemma 3.1 shows that [ can be written as the disjoint union of 2” 


rectangles, all congruent (via translation) to [4'?, where $ = ($F, sig Hs 
Axioms u2 and w3 then imply that 
1 
12 
M = aa, 
( 0 ) 9 é 
Repeating this argument inductively shows that . N 


r, 1 ee | 1 1 
u(T0") = oar i oe =< re (4.2) 
We shall now use (4.2) to verify (4.1). The idea is 
to approximate any rectangle by unions of trans- 


lates of cubes at Fig. 8.6 


Observe that in proving (4.1) we need to consider only rectangles of the form 
[_]p- In fact, we take c = b — a and observe that 


7T_(L2) = Os, 
so Axiom #3 implies that »((P) = u(L), and by definition ¢!.--¢* = 
(b} — a').-- (b" — a"). If [§ = Z, then (4.1) is trivially true (from Axiom 
#2). Suppose that 11§ = @. Thene= <ce!,...,¢*> with ct > 0 for all 7. 
For each r there are 2 integers N!,..., N” such that (Fig. 8.6) 
N‘/2? < ef < (N* + 19/2”. (4.3) 


In what follows, let k= <k!,...,4">, L= <l',...,i">, ete, denote 
vectors with integral coordinates (i.¢., the &s are integers). Let us write 
k < Lif k; < i, forallt. If N= <N,,...,N,>, then it follows from (4.3) 
and the definitions that 


1/37) -4LiaF 
aeoK c(O§ whenever k<N. 
For any k and L, 


(1/27)k +1/2F (1/27) L412" : 
Cea NM (1/27L = D if k al L. 


Since cye2n. 41/2 1 
HLci/27k =" Der 


328 INTEGRATION 8.5 


by (4.2) (and Axiom w2)} and 
eae Clin 
k<N 

we conclude that. 


n(Lle) = sit x {the number of k satisfying 0 < k < N). 


It is easy to sce that there are N,- No-...-N, such k, so that 
© I Ar Ny Na 
nw(L 1) 2 Qrr x (N | ees Nx) = (2) ufo (@) . 


According to (4.3), 4/2" > ¢, — 1/2", so we have 


e(Ll) = (c - x) a G _ 5): (4-1) 


; (1/27) he pasar 
Lic U I eveey ’ 
2 


k<N+ 


Similarly, 


and we conclude that 
2 2 x 
wif_o) < (ct - =) paraet (<" + Or (4.5) 


Letting 7 > « in (4.4) and (4.5) proves (4.1). 

In deriving (4.1) we made use of Axiom y4. Examining our argument, show:. 
that if pz’ satisfied n2 and p83 but not w4, we could argue in the same manner 
except that we would have to multiply everything hy the fixed constant y’(1). 
To sum up, we have proved: 


Proposition 4.1. If w satisties Axioms #1 through p4, then the value of 4 
or any rectangle is uniquely determined and is given by (4.1). Ef yp’ satislivs 
pithrough p23, then for any rectangle (2, 


w(O2) = Kes) where KK = 2'((0). 


5. THE MINIMAL THEORY (Continued) 
We will now show that formula (4.1) extends to give a unique p» defined on %,,;.. 
30 as to satisfy Axioms #1 through w4. We must establish essentially two facis. 
1) Every union of rectangles can be written as a disjoint union of rectangles 
This will then allow us to use Axiom wl to determine (A) for every A E Dyin, 
by settin . 
? . w(A) = SD w(T 12) 


if A is the disjoint union of the ages Since A might be written in another way 
as a disjoint union of rectangles, this formula is not well defined until we estab- 
lish that: 
2) If A — ULI = ULT® are tivo representations of A as a disjoint union 
of rectangles, then 


x HL) = ¥ 2(D%). 


8.5 THE MINIMAL THEORY (CONTINUED) 329 


9 " 
<eh, > <ch, 68> 


<e}, 3 >— => -- =--- —r<el, > 


<eh, cf> 


<e}, > 


‘, 


1 
<Cpe OT > 


DD pe se Se ers 
<c], ¢2> <eh, op> 


We first introduce some notation. 


Definition 5.1. A paving » of E” is a finite collection of mutually disjoint 
rectangles. The floor of this paving, denoted by |p|, is the union of all 
rectangles belonging to p. 


If p = {(_J23 and T is a translation, we set Tp = {TB}. 

If p and 3 are two pavings, we say that 3 is finer than p (and write g < p) 
if every rectangle of # is a union of rectangles of g. It is clear that if p < v 
and 3 < », then g < v. Note also that zg < p implies |p| C |g}. 


Proposition 5.1. Let » and g be any two pavings. There exists a third 
paving v such thaty < pandye < Z. 


Proof. The idea of the proof is very simple. Fach rectangle in # or in 3 deter- 
mines 2” hyperplanes (each hyperplane containing a face of the rectangle). 
If we collect all these hyperplanes, they will “enclose” a number of rectangles. 
We let » consist of those rectangles in this collection which do not contain any 
smaller rectangle, Figure 8.7 shows the case (for 1 = 2) where » and 3 each 
contain one rectangle. Here v contains nine rectangles. 

We now fill in the details of this argument. Let e, = <cl....,c1>,..., 
t, = <ct,...,q@> be all the vectors that occur in the description of the 
rectangles of p and g. (In other words, if (> € p or € g, then a and bare among 
the e's.) Let di, ... , dx be the vectors of the form <cj,,..., cf >, where the z,’s 
range independently from 1 to k, (so that there are &” of them). (See Fig. 8.8 for 
the case where n = 2 and p and 3 consist of one rectangle each.) For each d; 
there is at most one smallest dj,;, such that d; < dj). In fact, if 


= 1 n 
d,; = Xp 0 yy 


then set djay = <c},,...,6,>, where 


330 INTEGRATION 8.5 


Let vy = {Tg}. Then is finer than p and 3. In fact, if [> € p, say, then 
Cb = (14s for suitable @ and 8 and 
Je@= U Oa. (5.1) 
agSay 
4,(osag 
To see this observe that if x € [_|9*, then d, < x < dg. Choose a largest 
d; <x. Then d; < x < djq), so x € L)%. This proves the proposition. We 
will later want to use the particular form of the » we constructed to find addi- 
tional information. 0 


We can now prove (1) and (2). 


Lemma 3.1. Let 1,...,9; be pavings. Then there exists a paving 3 
such that |g] = |@i| U--- U dpe. 


Proof. By repeated applications of Proposition 5.1 we can choose a paving v 
which is finer than all the »,’s. Then each |,| is the union of suitable rectangles 
ofy. Let g be the collection of all these rectangles occurring in all the p,’s. 
Then |g| = |p| U--- U [pal O 


In particular, we have proved (1). More generally, we have shown that every 
A EDnin is of the form A = 'p| for a suitable paving p. We now wish to tum 
our attention to (2). 


Lemma 5.2. Let ce} <--+- < a | Sree Cop chest <see << Gt bew 
sequences of numbers. Then 


<o peeeg > l<tery Meg pitees > 


lSincry 


Proof. In fact, ¢, — ci = ch —e, tes — +--- +4, — e_,, so that the 
lemma follows from (4.1) when we multiply out all the factors. U 


We now prove (2). Let p = {Tb} and g = {_]t}, where A = [p| = |g!. 
Let » be the paving we constructed in the proof of Proposition 5.1. Let « = 
{187} be the collection of those rectangles [ [f6 of » such that [198 c |p| = |al. 
Then to prove (2) it suffices to show that 


Paley = Tw = S(O): (5.2) 


Now each rectangle [_]>‘ is decomposed into rectangles (]§# according to (5.1), 
that is, a; = dz, b; = da, ete. 
By construction of the d’s, this is exactly a decomposition of the typ«c 
described in Lemma 5.2. Thus (5.1) implies that 
WapH= > OH), 


dads 
dyj<dyp 


8.6 CONTENTED SETS 331 


Summing over all [2 (and doing the same for a) proves (5.2). We can thus 
state: 


Theorem 5.1, Every 4 € Dmin can be written as A = |p|. The number 
u(A) = Soeyse(L]) does not depend on the choice of p. We thus get a 
well-defined function % on Dmin: It satisfies Axioms y1 through p4. If yw’ is 
any other function on Di, satisfying «2 and 43, then «/(A} = Ky(A), 
where K = »’([))). 


Proof. The proof of the last two assertions of the theorem is easy and is left as an 
exercise for the reader. 


6. CONTENTED SETS 


Theorem 5.1 shows that our axioms are not vacuous. It does not provide us 
with a satisfactory theory, however, because Dyin contains far too few sets, 
In particular, it does not fulfill requirement (iii), since Din is not invariant 
under rotations, except under very special ones. We are now going to remedy 
this by repeating the arguments of Section 4; we are going to try to approximate 
more general sets by sets whose u’s we know, 1.e., by sets contamed in Dpin. 
This idea goes back to Archimedes, who used it to find the areas of figures in 
the plane. 


Definition 6.1. Let A be any subset of E”. We say that p is an inner paving 
of A if |p| C A. We say that g is an outer paving of A if A C |gl. 


We list several obvious facts. 


If |e] C A C |gl, then w(p) < p(s). (6.1) 
If |p| C A C |g], then |Z%e| c TA C[TSI. (6.2) 
If Ai N Ag = @ and [pil C Ai, [pel C Aa, 

then #; U #2 is an inner paving of Ay U Ag. (6.3) 


Definition 6.2. For any bounded subset A of E® let 


u(A) = ub, a(lal) 


be called the zuner content of A and let 
B(A) = glb p(\3) 
AC ial 


be called the outer content of A. 


Note that since A is bounded, there existsa g with A C |g]. This shows that 
R(A) is defined. This together with (6.1) shows that 4*(A) is defined and that 


M*(A) < f(A). (6.4) 


332 INTEGRATION 8.6 


Definition 6.3. A set A will be called contented if u*(A) = a(A). We call 
u*(A) = G(A) the content of A and denote it by 2(A). 


Observe that every A € Dmin is contented. In fact, if A = |x|, then » is 
both an inner and an outer paving of A. Thus u*(A) = f(A) = u(|e]}, and the 
new definition of (A) coincides with the old one. 

Our next immediate objective is to show that the collection of all contented 
sets fulfills Axioms D1 through D3. 


Proposition 6.1. A sct A is contented if and only if its boundary is con- 
tented and has content zero. 


Proof. Suppose A is contented. For any é > 0 we can find an inner paving 7 
and an outer paving z such that »(z) — u(p) < 8/2, We want to replace p by 
a close paving p’ with |p’| Cint A. To do this, we choose a small number 7 
and replace each rectangle []? of p by [PT%b"2. We let p, be the collection 
of all these rectangles. Then |p, Cint ||, so |p,{ Cint A. Furthermore, 
K(lPx|) = (1 — 2y)"uClel}, since the factor (1 — 2y) is the decrease of each side 
of each rectangle of p. Similarly, we replace g by a slightly larger 3,, with 
A Cint g, and p($,) < (1+ 2n)"#(g). By choosing » sufficiently small, we 
can thus arrange that p(g,) — «(p,) < 6. Let » be a paving which is finer 
than g, and p,, with |»| = |g,|. Let 2 Cv consist of those rectangles of » 
lying in int A. Then |e| = {g,|>|2]D |pal, so w(le|) — w(]4|) < 8. But 
dA C |v — 4], so that a(AA) < lly — 4|) = w((v|) — w(j4|) < 6 In other 
words, (8A) = 0. 

Conversely, suppose that 6.4 has content zero. Let 4 be an outer paving of 
dA with p{|4|) <¢. Let » be a paving finer than 4 and such that A C |. 
Let p C & consist of those rectangles contained in A. Let g C ¥ consist of those 
rectangles lying in 'p| U [4]. Then w(/z.) < w(lpl) + w(]4|) < (el) + €. Further- 
more, AC |g]. In fact, lets € A. Thenz € [] forsome J ev. If nad + 
@, then JN |4| # S, so] C.4|, since vy isa refinement of 4. fT] NeA = @, 
then every point of [] must lic in A, 30 that [] Cc |p|. We have thus constructed 
pand g with |p| C A C [gf and x(z) — w(p) < € Since we can do this for any ¢, 
this implies that A is contented. O 


Proposition 6.2. The union of any finite number of sets with content zero 
has content zero. If A C B and B has content zero, then so does A. 


Proof. The proof is obvious. 


Theorem 6.1. Let D,., denote the collection of all contented sets. Then 
Deon satisfies Axioms D1 through 3, and the » given in Definition 6.3 sat- 
isfies wl through p4. If xz’ is any other function on Deon Satisfying wl through 
#3, then w’ = Kp, where K = p’(C]}). 


Proof. Let us verify the axioms. 
D1. For any A and B, 
0(A UB) CAA UAB and 6(A 1B) CGA VOB. 


8.7 WHEN IS A SET CONTENTED? 333 


By Proposition 6.1, if A and B are contented, then @A and @B have content 
zero, Thus so do 04 UGB, 0(A UB), and a{4 8B), by Proposition 6.2. 
Hence A U B and 4B are contented. 

2. Follows immediately from (6.1). 

3. Is obvious. 

pe. If A, and As are contented, we can find inner pavings ~, and #» such 


that 4(A1) — u(lPil) < €/2and p(Ag) — p(|pol) < €/2. If A1N Ae = ©, 
then #, U #>» is an inner paving of A; U Ag, and so 


MCA, U As) > (Ai) + (Ae). 


On the other hand, Jet z; and #2 be outer pavings of A; and Ag, respectively, 
with #(#1) < w(A1) + €/2 and u(g2) < w(A2) + €/2. 

Let vy be a paving with || = |z,| U |zo}. Then @ is an outer paving of 
A, U Ae and p(|vj) < w(lgil) + wsel). Thus w(A, U Ag) < wfle}) < 
B(A1) + (Ae) + €, oru(A, U Ae) < w(A1) + w(A2). These two inequal- 
ities together give y2. 
wl. Is obvious. 
u3. Follows from (6.2) and Definition 6.3. 


ut. We already know. 

The second part of the theorem follows from Theorem 5.1 and Definition 6.3. 
In fact, we know that y’(\p]) = Ku(lp|), and (6.1) together with Axiom 2 
implies that y’('p|} < w’(A) < u’(lg]). Since we can choose p and % to be 
arbitrarily close approximations to A (relative to «), we are done. U 


Remark. If is useful to note that we have actually proved a little more than 
what is stated in Theorem 6.1, We have proved, namely, that if D is any collec- 
tion of sets satisfying D1 through D3, such that Dain CD C Deon, andifp’: D> R 
satisfics x1 through 43, then »’(A) = Kp(A) for all A in D, where K = p’(T). 


%. WHEN IS A SET CONTENTED? 


We will now establish some useful criteria for deciding whether a given set js 
contented. 
Recall that a closed ball Bf with center x and 
radius 7 is given by x+ri 
By = fy: lly — xl| < 7}. (7.1) 
Note that 
BocCetgte! = forany e€>0, (7.2) 
and 
me Gee oes (7.3) 
(See Fig. 8.9.) Fig. 8.9 


334 INTEGRATION 8.7 


If we combine (7.2) and (7.3), we see that any cube C lies in a ball B such 
that a(B) < 27(./n)"u(C) and that any ball B lies in a cube C such that u(C) < 
3" (0/ny"a(B). 


Lemma 7.1. Let A be a subset of E*. Then A has content zero if and only 
if for every € > 0 there exist a finite number of balls {B;} covering A 
with }° @(B,) < e. 


Proof. Uf we have such 2 collection of covering balls, then by the above remark 
we can enlarge each ball to a rectangle to get a paving » such that A C |p| and 
w(|pl) < 3°(\/n)"e. Therefore, a4) = 0 if we can always find the {B;}. 

Conversely, suppose A has content 0. Then for any 6 we ean find an outer 
paving p with u({p|} < 6. For each rectangle [] in the paving we can, by the 
arguments of Section 4, find a finite number of cubes which cover [7] and whose 
total content is as close as we like to x(__}), say <2yu((_]). By doing this for each 
CJ € p. we have a finite number of cubes {[7],} covering A with total content 
less than 26. Then by our remark before the lemma each cube [], lies in a ball 
B; such that u(_y) < 2°(/n)"a(B,), and so we have a covering of A by balls B; 
such that ¥ a(B;) < 2*+'G/n)"8. If we take 6 = €/2"*}(./n)", we have the 
desired collection of bails, proving the lemma. i 


Recall that a map ¢ of U/ CE” —- E” is said to satisfy a Lipschitz condition 
if there is a constant K (called the Lipschitz constant) such that 


ley) — e@}ll < Klily — =|). (7.4) 


Proposition 7.1. Let A be a set of content zero with ACU, and let 
¢:. U — E” satisfy a Lipschitz condition. Then y(A) has content zero. 


Proof. The proof consists of applying both parts of Lemma 7.1. Since A has 
content zero, for any ¢ > 0 we can find a finite number of balls covering A whose 
total outer content is less than e/K”. By (7.4), o(Bt) C B24), so that the images 
of the balls covering A cover ¢(A) and have a total volume less than e€. 9 


Recall that if ¢ is a (continuously) differentiable map of an open set U into 
E", then ¢ satisfies a Lipsehitz condition on any compact subset of U. 
Asa consequence of Proposition 7.1, we can thus state: 


Proposition 7.2. Let y be a continuously differentiable map defined on an 
open set U, and let A be a bounded set of content zero with A Cc U. Then 
¢(A) has content zero. 


Let A be any compact subset of E* lying entirely in the subspace given by 
xz” = 0. Then A has content zero. In fact, for some sufficiently large fixed +, 
the set A is contained in the rectangle 

Krpueaty 
piaisuinee forany € > 0, 


which has arbitrarily small volume. 


8.8 BEHAVIOR UNDER LINEAR DISTORTIONS 335 


Now let ¥: V Cc E*—! — E” be a continuously differentiable map given by 
<y’, vada yl > = <7 Cy), lee yet) se vy, ea aimee bast 
Let B be any bounded subset of E*~! with B Cc V. We can then write ¥(B) = 


g(A), where A is the set of points in E” of the form (y, 0), where y & B, and 
where ¢ is a differentiable map such that 


ys ip) eR pete acy UE geese re 
By Proposition 7.2 we see that n(y(B)) = 0. Thus, 


Proposition 7.3. Let y be a differentiable map of V Cc E*—! into E", and 
let B be a bounded set such that B c ¥. Then ¥(B) has content zero. 


We have thus recovered requirement (v)} of Section 1. 
An immediate consequence of Propositions 7.3 and 6.1 is: 


Proposition 7.4. Let A Cc E" be such that 6A C Uy,(B;) where each ¥; and 
8B; is as in Proposition 7.3. Then A is contented. 


This shows that every set “we can draw” is contented. 


Exercise. Show that every ball is contented. 


8 BEHAVIOR UNDER LINEAR DISTORTIONS 
We shall continue to derive consequences of Proposition 7.1. 


Proposition 8.1. Let ¢ be a one-to-one map of U — E” which satisfies a 
Lipschitz condition and is such that ¢—! is continuous. If A Cc U is con- 
tented, then so is o{A). 


Proof. Since A is contented, 8A has content zero. By the conditions on ¢, 
we know that ég(A) = 9(8A}. Thus d¢(A) has content zero, and so ¢(A) is 
contented. U 


An immediate consequence of Proposition 8.1 is: 


Proposition 8.2, Let L be a linear transformation of E”. Then LA is con- 
tented whenever A is contented. 


Proof. If E is nonsingular, Proposition 8.1 applies. If Z is singular, it maps all 
of E® onto a proper subspace. Any such subspace is contained in the image of 
{x :2" = 0} by a suitable linear transformation, and so w(ZA) = 0 for any 
contented A. U 


Theorem 8,1, Let Z be a lincar transformation of E”, Then for any con- 
tented A we have w(LA) == |det Liu(A). (8.1) 


Proof. We can restrict our attention to nonsingular Z, since we have already 
checked Eq. (8.1) for det 2 = 0. If Z is nonsingular, then L carries the class of 


336 INTEGRATION 8.9 


contented sets into itself. Let us define uw’ by w’(A) = w(LA) for each A © Doon. 
We claim that y’ satisfies Axioms ul through 23 on Deon. 

In fact, zl and «2 are obviously truc; 43 follows from the fact that for any 
translation T,, we have Tz, = LT,, so that 


w(TeA) = w(LT A) = wT yybA) = (LA) = p’(A). 
By Theorem 5.2 we thus conelude that 
mw = Kreyp, 


where kz is some constant depending on L. We must show that A, = |det Z]. 
We first observe that if O is an orthogonal transformation, then 


u(OA) = p(A). 


In fact, we know that u(0A) = kou(A). If we take A to be the unit ball B}, 
then OB) = Bi, su ko = 1. 
Next we observe that u(LZiL.A) = ky p(LeA) = kz,kr,e(A), so that 


Keybe = ki Jktr,. 


Now we recall that any nonsingular Z can be written as Z = PO, where P 
is a positive self-adjoint operator and O is orthogonal. Thus ky = kp and 
|det Z| — |det P| |det Of — |det /|, so we need only verify (8.1) for positive self- 
adjoint linear transformations. Any such P can be written as P = 0, DO7?, 
where O; is orthogonal and DP is diagonal]. Since P is positive, all the eigenvalues 
of D are positive. Since det P = det Dand kp = kp, we need only verify (8.1) 
for the case where L is given by a diagonal matrix with positive cigenvalues 
Mp. sey Ans But then LO = (1g to**, so that 


(0) = w(t") = Ane + An = [det Zi, 
verifying (8.1). B 


Exercise, Letvi,..., ¥2 be vectorsof £". By the parallelepiped spanned by ¥1,..., ¥2 
we mean the set of all veetors of the form )77., 2'v,, where 0 < 2? < 1. Show that its 
content is [det ((v,, vj) )f?. 


9. AXIOMS FOR INTEGRATION 


So far we have shown that there is a unique « defined for a large collection of 
sets in E*. However, we do not have an effective way to compute pg, except in 
very special cases. To remedy this we must introduce a theory of integration. 
We first introduce some notation, 


Definition 9.1. Let / be any real-valued function on E”. By the support of 
f, denoted by supp jf, we shall mean the closure of the set where f is not zero; 
that is, 

supp f = ix: f(x) = O}. 


8.9 AXIOMS FOR INTEGRATION 337 


Observe that 
supp (f + g) C supp f U supp g (9.1) 
and 
supp fg C supp f M supp g. (9.2) 


We shall say that f has compact support if supp f is compact. Equation (9.1) 
{and Eq. (9.2} applied to constant g] shows that the set of all functions with 
compact support form a vector space. 

Let T be any one-to-one transformation of E” onto itself. For any function f 
we denote by Tf the functions given by 


(Tf)(x) = f(T x). (9.3) 
Observe that if T’ and T~' are continuous, then 
supp Tf = T supp f. (9.4) 


Definition 9.2. Let A be a subset of E*. By the characteristic function of A, 
denoted by e4, we shall mean the function given by 


awe(t if xed, 

0 if x€A. (9.5) 
Note that 
€4\MAg = A, * CAD (9.6) 
CAyUAg = CA, + CAg — CANAD? (9.7) 
supp ¢a = A, (9.8) 
and 

Tea = era (9.9} 


for any one-to-one map T' of E” onto itself. 

By a theory of integration on E” we shall mean a collection ¥ of functions 
and a rule f which assigns a real number [f to each f € ¥, subject to the follow- 
ing axioms: 

Fl. Fis a vector subspace of the space of all bounded functions of compact 

support. 

gz. If fe and T isa translation, then Tf € §. 

¥3. é belongs to ¥ for any rectangle [). 

fi. fis a linear function on ¢. 

fz. fT = ff for any translation 7. 

f3. Iff > 0, then ff > 0. 


fa. Jequ ="1, 
Note that the axioms imply that $ contains al! functions of the form 
@a, + én, +--+: + ea, for any rectangles (_];,..., [jk In particular, for any 


paving p, the function e ip| Must belong to §. 


338 INTEGRATION 8.10 


Also note that from [3 we have at once the stronger version: 
fs’. fs g=ff< Sa, since then g — f > 0. 
Proposition 9.1. Let §, { be a system satisfying Axioms ¥1 through 53 and 
fi through [4. Then 
fea = uA) (9.10) 


for every contented set A such that e4 € §, and 


fs. |[ft < [ifllew(supp f) for every f 5. 


Proof. The axioms guarantee that e4 € F for every A € Din and that v(A) = 
fe ‘4 satisfies 41 through 44. Therefore, Sea = w(A) for every A € Dmin by the 
uniqueness of » (Proposition 4.1). It follows that if A is a contented set such 
that e4 € 5, and if » and ¢g are inner and outer pavings of A, then 


ullel) = fey < fea S fei = a(lal). 


Therefore, fea lies between #*(A) and (A), and so equals u(A). For any 
fes and any A € Dai, such that suppfcA, we have —ilfllea <f < 
Il flies, and therefore |[f| < {{fllo(A) by [2 and (9.10). Taking the greatest 
lower bound of the right side over all such sets A, we have [5. 


10. INTEGRATION OF CONTENTED FUNCTIONS 


We will now proceed to deal with Axioms $ and f in the same way we dealt 
with Axioms Dandy. We will construct a “minimal” theory and then get a “big” 
one by approximating. According to Proposition 9.1, the class ¥ must contain 
the function ej) for any paving ». By $1 it must therefore contain all linear 
combinations of such. 


Definition 10.1. By a paved function we shall mean a function f = fp 


given by 
f= X ceo, (10.1) 
are 7) 
for some paying #. 
It is easy to see that the collection of all paved functions satisfies Axioms ¥1 
through $3. Furthermore, by Proposition 9.1 and Axiom f1 the integral, f, is 
uniquely determined on the class of all paved functions by 


{f= DemO: 


— 


(10.2) 


if f is given by (10.1). 

The reader should verify that if we let Fp be the class of all paved functions 
and let { be given by (10.2), then all our axioms are satisfied. Don’t forget to 
show that J is well defined: if f is expressed as in (10.1) in two ways, then the 
sums given by (10.2) are equal. 


8.10 INTEGRATION OF CONTENTED FUNCTIONS 339 


The paved functions obviously form too small 2 collection of functions. 
We would like to have an ¥ including all continuous functions with compact 
support and all characteristic functions of the form e4 with A contented, for 
example. 


Definition 10.2. A bounded function f with compact support is said to be 
contented if for any € > 0 and 6 > O there exists a paved function g = gs 
and a contented set Ad = A, such that 


f(x) — g{x)| < € forall x¢aA (10.3) 
w(A} < 8. (10.4) 
The pair <g, A> will be called a paved e,8-approximation to f. 


and 


Let us verify that the collection of al! contented functions, Spon, satisfies 
Axioms $1 through 33. It is clear that if fis contented, so is af for any constant a. 
If f; and f, are contented, let <g,, 41> and <go, A> be paved e,d-approxi- 
mations to f,; and fo, respectively. Then 


If: + f2)(x) — (91 + g2){x)| < 2€ forall x @ A, U Ag, 


and 
uA, U As») < 25. 


Thus <g) — g2, A; U Ag> gives a paved 2¢, 25-approximation to J; + fo. 

To verify $2 we simply observe that if <g, A > isa paved €,6-approximation 
to f, then <Tg, TA> is one to Tf. 

A similar argument establishes the analogous result for multiplication: 


Proposition 10.1, Let f; and fo be two contented functions. Then fife is 
contented. 


Proof. Let M be such that |f,(x) < Jf and | fo(x)| < Af for all x. Recall that 
the product of two paved functions is a paved function. Using the same notation 
as before, we have 


[fifo(x) — gi(xdge(x)| S [f160)| |fetx) — go(x)| + [ge(x): [filx) — gi(x)] 
< Me+ (lf + ee forall x @A,U Ao. 


Thus <gig2, 4; U Ae> isa paved (2Af + ee, 26-approximation to fifo. U 

As for 53, it is immediate that a stronger statement is true: 

Proposition 10.2. If B is a contented set, then eg is a contented function. 
Proof. In fact, let # be an mner paving of B with w(B) — z(|p|) < 6 Then 
éep(x) — € Ip (x) =0 if x@B— |pl, 

w(B— |pl) < 8, 


$0 Clg P| is a paved €,é-approximation to ez for any € > 0. 0 


and 


340 INTEGRATION 8.10 


We now establish a useful alternative characterization of a contented func- 
tion. 


Proposition 10.3. A function f is contented if and only if for every e there 
are paved functions A and & such that h < f < k and sé —h) <e. 


Proof. Iffis contented, let R be a rectangle inchiding supp f. Let<g, 4 > be an 
é€, é-approximation to f. Let P be a paved set including A = A. such that 
u(P) < 6, and let m be « bound of jf|. Then g — e(eg) — mep < f < g4- 
e(en) + mep, where the outside functions are clearly paved and the differences 
of their integrals is less than 2eu(#) + 2mé. Since € and 6 are arbitrary, we have 
our hk and &. Conversely, if A and & are paved functions such that kh < f < k& 
and [(k — k) < a, then the set where k — h > a? isa paved set A. lurther- 
more, a'/?y(A) < fea(ke — hk) < [(k — A) < a, so that w(A) < a/?. Given € 
and 4, we only have to choose a@ < min (€?, 6”) and take g as either k or A to sec 
that f is contented. 0 


Corollary. A function f is contented if for every € there are contented 
functions f; and fg such that f) < f < fe and f(f2 — fi) <¢. 


Proof. Tor then we ean find paved functions 4 < jf; and & > fo such thal 
Sf —h) < eand f(k — fo) < eandend upwithh <f < kand f(k —A) <3e.0 


Theorem 10.1. Let ¥ bea class of functions satisfying Axioms ¥1 through $3 
and such that $p CF CFeon. Then there exists a unique f satisfying Axioms 
fl through [4 ons. 


Proof. If fii is any integral on ¥ satisfying Axioms f 1 through if 4, then we must 
have f f simultaneously equal to lub fh for h paved and </ and equal to glb if k 
for & paved and >/, by Proposition 10.3. The integral is thus uniquely de- 
tenmnined on ¥. Moreover, it is easy to see that if the integral on ¢ is defined by 
Sf = lub fk = glb fh, then Axioms fi through [4 follow from the fact that 
they hold for the uniquely determined integral on the paved functions. 0 


Exercise 10.1. Let f and g be contented functions such that f(z) = g(z) for z¢ A, 
where z(t) = 0. Then ff = fg. (This shows that for the purpose of integration 
we nced to know a function only as to @ set of content zero.) 


Definition 10.3. Let { be a contented function and A a contented set. 
We call fesf the integral of f over A and denote it by ff. Thus 


[oa fet. (10.5) 
An immediate consequence of Axiom | 1 and (9.7) is 
oye = Lae [i= Lniet 100) 


An immediate consequence of Exercise 10.1 is 


L/a2] S sup yed|aca. 


8.10 INTEGRATION OF CONTENTED FUNCTIONS 341 


We close this section by giving another useful characterization of contented 
functions. 


Proposition 10.4. Let f be a bounded function with compact support. 
Then f is contented if and only if to every € > O and 6 > 0 we can find an 
9 > Qand a contented set As such that »(As) < 6 and 


lf) — fly)| < € whenever Ix — yl| <4 and x,y @ As. (10.7) 


Proof. Suppose that for every ¢, 6 we can find » and A;. Let p = {{7],;} bea 
paving such that 


i) supp fc lpl; 
ii) if x,y € C1; then |x — yl| < 9; 
iii) ifg = {Lie p: Li As ~ @}, then p(|g|) < 26. 


Then let f..95(x) = f(x;) when x € [[];, where x; is some point of [],. By (ii) 
and (iii), we see that f.9s,|3| is a paved e€, 26-approximation to f. Thus f is 
contented. 
Conversely, suppose that f is contented, 
and let f.e.6/2, Aej2,3/2 be a paved approxima- 
tion to f. 
Let » = {L];} be the paving associated 
with feos. Replace each (]; by the rec- 
tangle [}; obtained by contracting []; about 
its center by a factor (1 — £). (Bee Fig. 8.10.) 


Thus w(Li) = (1 — 2)"#(L). For any x, | | | | 
ye ULI, if 


Ix — yll <a 


where 7 is sufficiently small, then x and y belong Fig. 8.10 
to the same (_]f. If 


xyE ULi: x, ¥ z A gs2,5125 and I[x — y]| <4; 
then 


If(x) — fly < 1G) — feesy2Qd FFG) — fes2.a2QV) 
+ |fer2,sp2(x) — feze.syoly)|- 


But the third term vanishes, so that |f(x) — f(y}] < ¢«. Now by first choosing é 
sufficiently small, we can arrange that x(\e| - ULI) < 6/2. Then we can 
choose 7 so small that |x — y[| < » implies that x, y belong to the same J; if 
x,y © UC}. For this » and for As = A,jess2 U (|o] — U_J), Eq. (10.7) holds, 
and w(A;) < 6. 0 


In particular, a bounded function which is continuous except at a set of 
content zero and has compact support is contented. 


342 INTEGRATION 8.11 
EXERCISES 


10.2 Show that for any bounded set A, e,4 is a contented function if and only if A is 
a contented set. 

10.3 Letf be a contented function whose support is contained in a eube{_}. For each 4 
let ps = {[js}sez5 be a paving with |p.{ = [_] and whose cubes have diameter les. 
than 6. Jet x.,s be some point. of [_}:.s. The expression 


2 fx, du.) 
wl3 


is called a Riemann 6-approximating sum for f. Show that for any ¢ > there exists a 
§o [= 5o(f)] > 0 such that 


[= De f(x;,)e(Ci.s) <e€ 
iel; 


whenever 6 < dp. 


Hl. THE CHANGE OF YARIABLES FORMULA 


This section will be devoted to the proof of the following theorem, which is of 
fundamental importance. 


Theorem 11.1. Let U and V_ be bounded open sets in R", and let » be a 
continuously differentiable one-to-one map of U onto V with »~! differentiable 
Let f be a contented function with supp f C V. Then (f «@} is a contented function, 
and 


[f= [Ge elldet 141 (11.1) 


Recall that if the map ¢ is given by y’ = yi(z',..., 2%), then J, is the 
linear transformation whose matrix is [dg,/dz,]. 

Note that if ¢ is a nonsingular linear transformation (so that J, is just ¢), 
then Theorem 11.1 is an easy consequence of Theorem 8.1. In faet, for functions 
of the form e4 we observe that e4° » = e,-i4, and Eq. (11.1) reduces, in this 
case, to (8.1). By linearity, (11.1) is valid for all paved functions. 

Furthermore, f o ¢ is contented. Suppose |f{x) — f{y)| < € when {lx — yl] < 
pgand x, y € A, with p(A) < 6. Then |fo (a) — fe ofv)| < € when 


lx — vi] <y/llel| and u,v€ ¢*(A), 


with p(y 1A} < 8/|det ¢]. 

Now let g..3, A.,s be an approximating family of paved functions for f. Then 
\f° (x) — ge8° o0)| < €forx Zo" '(A,5) anduly 1A.) < 5/|det gj. Thus 
S(g.8° e)ldet y] > fifo y)|det ¢|, and Eq. (11.1) is valid for all contented f. 

The proof of Theorem 11.1 for nonlinear maps is a bit more tricky. It con- 
sists essentially of approximating ¢ locally by linear maps, and we shall do it in 
several steps. We shall use the uniform norm ||x||.. = max |z'| on R”. This is 
convenient because a ball in this norm is actually a cube, although this nicety 
isn’t really necessary. 


8.11 THE CHANGE OF VARIABLES FORMULA 343 


Let ¥ be a (continuously) differentiable map defined on a convex open set U. 
If the cube (] = ()8t} lies in U, then the mean-value theorem (Section 7, 
Chapter 3) implies that for any y € [_], 


Ilex) — ¥{P)lle < [ly — plle sup l¥v{@)Il- 
Thus 
WO) CObeiten, where K = sup ||Jy(2)|l. 
Thus , 
HiT) < Gap Hy@l"«O). (11.2) 
Lemma 11.1. Let ¢ be as in Theorem 11.1. Then for any contented set A 
with A Cc U we have 
u(e(A)) < I, ldet J. (11.3) 


Proof. Let us apply Eq. (11.2) to the map ¥ = L71y, where ZL is a linear 
transformation. Then 


[det L]~"(e(D)} = w(L~"e()) < (up Way)" oO). 
Since Jz-ty = L—'J,, we get 
w(o(D)) S |det Li (sup |L~'V6(2)|/)" wD) (11.4) 


for any [[] contained in the domain of the definition of ¢ and for any lnear 
transformation L. 
For any € > 0, let 8 be so small that |lJ.(x)—'Jy(y)|| < 1+ € for 


IIx — ylla < 8 


for all x, y in a compact neighborhood of A. (Et is possible to choose such a 4, 
since J(x) is a uniformly continuous function of x, so that J,(x)~4J,(y) is close 
to the identity matrix when x is close to y; sce Section 8, Chapter 4.) 

Choose an outer paving g = {[_],} of A, where the [_]; are cubes all having 
edges of length less than 6. Let x; be @ point of [_];. Then applying (11.4) to 
each [_]; taking L = J,(x;), we get 


u(o(A)) < x(v(al)) = Duel) < & Idet Jp(x,)| + €)*e(L)). 
We can also suppose 5 to have been taken small enough so that 
|det J,(z)] > (1 — €}[det Jy(x,)| forall ze]; andall 7. 
Then we have 
is ldet Jy| > (1 — ofdet Jy(x)lu(d, 


and so 
1 


u(e(A)) <p> 


a+ orf Idet Jy}. 


Since ¢ is arbitrary and 3 is an arbitrary outer paving of A, we get (11.3). 0 


344 INTEGRATION 8.15 


We can now conclude that f° ¢ is contented for any contented f with 
supp fc ¥. In fact, let K be chosen so large that it isa Lipschitz constant for 
on ¢” | (supp f), and so large that K > {det J,—1(u)| for u € supp f. Now given 
€ and 6, we can find an » such that 


lfa) — ftv)[ < € if |lu — vil < gandu,v € A; with (As) < 6. 
But this implies that 
Ifo e(x) — fe ely) 


where u(y !(A5)) < Ky, by (11.8). Since K was chosen independently of ¢ 
and 8, this shows that fo ¢ is contented. 


<e if |lx —yl} < 9/K and x,y € ¢ !(A)), 


Lemma 11.2. Let ¢, U, and V be as in Theorem 11.1. Let f be a nonnega. 
tive contented function with supp fc V. Then 


a Js f (f o y)|det Jf. (11.4) 


Proof. let <y, A> bea paved €,é-approximation to f with g(a) < f(a) forall u. 
if p = {C1} is the paving associated with g, we may assume that supp f C |p| 
Then 


fo= XL oom) s Dowaf’ Meteo <E foo, Uo leet a 


uO; 


= ane gy|det Jy] = fife v)|det Jl. 


Ug-(D,; 
Since we can choose g so that fg — ff, we obtain (11.5). C 


Lemma 11.3. Let », U, ¥, and f be as in Theorem 11.1. Let f be a non- 
negative funetion. Then Iiq. (11.1) holds. 


Proof. Let us apply (11.5) to the map ¢—! and the function (fe y)|det J,J. 
Since Jy {x) o Jy-1(¢(x)) = id, we obtain 


{Fe Aldet Je] < fife e) oe "(det Syl oe Y[det Jp 


= ff. 


Combining this with (11.5) proves the lemma. 6 


Completion of the proof of Theorem 11.1. Any real-valued contented function can 
be writlen as the difference of two positive contented functions. If for all x, 
f(d) > —AM for some large M, we write f= (f+ Men) — Meq, where 
supp / C[_}. Since we have verified Eq. (11.1) for nonnegative functions, and siner 
both sides of (11.1) are linear in f, we are done. Similarly, any bounded complex- 
valued contented function f ean be written as f = f; + ife, where f; and fo are 
bounded real-valued contented functions. U 


8.11 THE CHANGE OF VARIABLES FORMULA 345 


In practice, we sometimes may apply Eq. (11.1) to a situation where the 
hypotheses of Theorem 11.1 are not, strictly speaking, verified. For instance, in 
R? we may want to introduce “polar coordinates”. That is, we let r, @ be coordi- 
nates on R*; if S is the set 0 < @ < 22,0 < r, we consider the map ¢: S > R? 
given by z = rcos #, y = rsin 6, where z, y are coordinates on a second copy 
of R*. Now this map is one-to-one and has positive Jacobian for r > 0. If we 
consider the open sets U C S given by 0 < 7,0 < 6 < 27 and V C R? given by 
V = R? — {x,¥:y = 0,2 > 0}, the hypotheses of Theorem 11.1 are fulfilled, 
and we can write (since det J, = r) 


[r= [Goer (11.6) 


if suppf CV. However, Eq. (11.6) is valid without the restriction supp f C VY. 
In fact, if D, is a strip of width € about the ray y = 0, x > 0, then f = fep, + 
fegn_p, and Jfen, — Oase — 0 (Fig. 8.11). Similarly, fife g)(re ven, ¢ 0, 
so that (11.6) is valid for all contented f by this simple limit argument. 


Fig. 8.11 


We will not state a general theorem covering all such useful extensions of 
Theorem 11.1. In each case the limit argument is usually quite straightforward 
and will be left to the reader. 


EXERCISES 


11.1 By the parallelepiped spanned by v',..., v? we mean the set of ally = }°> Ey? + 
+>. + g"y7, where 0 < & < 1. Show that the content of this parallelepiped is given by 
|det (Cvs, vj) ){1/?. 

11.2 Express the content of the ellipsoid 
1\2 ny2 
fe: Pps | 
in terms of the content of the unit ball. 


11.3 Compute the Jacobian determmnant of the map <r,@>1r>o <z,y>, where 
x=rcos#,y =rsin 6, 

11.4 Compute the Jacobian determinant of the map <r,6@,e>t+ <z,y,2>, 
where « = rcosgysin 6, y = rsing sin @,z = r cos @. 


IL.5 Compute the Jacobian determinant of the map <r,0,z2> +> <z,y,2>, 
where x = rcos6,y = rsin@,z = 2. 


346 INTEGRATION 8.12 


12, SUCCESSIVE INTEGRATION 


In the case of onc variable, i.e., the theory of integration on R!, the fundamental 
theorem of the calculus reduces the computation of the integral of a function to 
the computation of its antiderivative. The generalization of this theorem to 
n dimensions will be presented in a later chapter. In this section we will show 
how, in many cases, the computation of an -dimensional integral can be reduced 
to ” successive one-dimensional integrations. 
Suppose we regard R”, in some fixed way, as the direct product R* = 

R* x R’. We shall write everyz ¢ R" as z= <x,y>, wherex € R* andy ER’. 


Definition 12.1. We say that a contented function f is contented relative te 
the decomposition R* = R* x R? if there exists a set Ay C R* of content 
zero (in R*) such that 


i) for each fixed x € R*, x & Ay, the function f(x, -) is a contented function 
on R; 

ii) the function fjf which assigns to x the number J,t/(x, -) is a contented 
function on R*. 


It is easy to see that the set of all such functions satisfies Axioms 1 through 
¥3. (The only axiom that is not immediate is $2. But this isan easy consequence: 
of the fact that any translation T can be rewritten as 772 ,where 7) is a trans- 
lation in R* and 72 is a translation in R’.) 

It is equally casy to verify that the rule which assigns to any such f the 


number 
Je fe £6) 


satisfies Axioms f1 through [4. The only one which isn’t immediately obvious 
is {3. However, if p is any paving with supp f Cc pl, then 


F< Wllejp) 
fae Cfallfenr) = WAL fein = WsllieCea), 


fafe €g = (eg) 


for any rectangle (direct verification). Thus, by the uniqueness part of Theorem 
10.1, we have ; 


ht= fe (fe fe, )). (12.1) 


Note, in particular, that if f is also contented relative to the decomposition 


R" = R' x R*, then 
fot = foafaSe.). 


In particular, for such f the double integration is independent of the order. 


and 


since 


8.12 SUCCESSIVE INTEGRATION B47 


In practice, all the functions that we shall come across will be contented 
relative to any decomposition of R®. In particular, writing R*? = R! x--- x R}, 


we have 
f.f=[C-Cffe---.9))- (12.2) 


In terms of the rectangular coordinates x',... , x", this last expression is usually 


written as 
[C- Cft@ «2% det) ) ae”. 


For this reason, the expression on the left-hand side of (12.2) is frequently 


written as 
fab foo file... ey ael ae" 


Let us work out some simple examples illustrating the methods of integra- 
tion given in the previous sections. 


Example 1. Compute the volume of the intersection of the solid cone with vertex 
angle a (vertex at O) with the spherical shell 1 <r < 2 (Fig. 8.12). By a Euclidean 
motion we may assume that the axis of the cone is the z-axis. If we introduce 
polar coordinates, we see that the set in question is the image of the set 


ee, LS <2, OS p< on, O05 0S a/2 
in the <7, ¢, @>-space (Fig. 8.13). 


ee, 
Oy 1 


Fig. 8.12 Fig. 8.13 


By the change of variables formula and Exercise 11.4 we see that the volume 
in question is given by 
2 -2r ca/2 2. 
6 db dp d 
f i f 7 sin ¢ dr 


fe sin 6 
2 pas2 2: 
anf f r sm 6 dé dr 
1 Jo 


H 


anf [1 — cos (a/2)}r? dr 
2m{1 — cos (a/2)j3 — 4). 


I 


348 INTEGRATION 8.42 


Example 2. Let B be a contented set in the plane, and let f; and fz be two con- 
tented fanctions defined on B. Let A be the set of all <x, y,2> € E® such that 
<a2,y> € Band f,(c, y) < z < folw, y}. If Gis any contented function on A, 
we can express the integral f aG@as 


Le=LUf en Gla del de dy. 


i(Zy) 


zag" 
For example, compute the integral f 4 2, where 
A is the set of all points in the unit ball lying 
aboue the surface z= 2* + y* (Fig. 8.14). 
Thits 


Fig. 8.14 


A= {<zy,2>:27? +y? +22 < 12> 2? + y"}. 


We must have x? + y? < a, where a? + a = 1 [so that a = (/5 — 1)/2], in 
order for <2,y,z> to belong to A. Then fi(x, y) = 27+ y, fo(z,y) - 
V1 — {#2 + y2), and 


[rr ade = 1 — (a? — y) — (x? + y*)?, 


1(Z,¥) 


so that, using polar coordinates in the plane (and Exercise 11.3), 


va 
ary Sie gah 3 GR i a eee 
a (x? + y?) — (x? + y?)7] wf r(L — 7? — r4) dr. 


As we saw in the last example, part of the problem of computing an integral 
as an iterated integral is to determine a good description of the domain of 
integration in terms of the decomposition of the vector space. It is usually « 
great help in visualizing the situation to draw a figure. 


Example 3. Compute the volume enclosed by a surface of revolution, Here we are 
given a function f of one variable, and we consider the surface obtained by 
rotating the curve z = f(z), 21 < 2 < 22, around the z-axis (Fig. 8.15). We 
thus wish to compute u(4), where 


A= {<z, y,2> 2a? + y? < flz), #1 < z< zg). 


Here it is obviously convenient to use cylindrical coordinates, and we sce 
that A is the image of the set 


B= {<r,0,2>:r < f(z), 0 < 0 < 27} 


in the <r, 6,2z>-space. By Exercise 11.5, we wish to compute 


1 ps Ee EP [ora dz d@ = 27 (f° var) dz = 27 LY ae 


Thus : 
uA) = w f° fG)? de. 


8.12 SUCCESSIVE INTEGRATION 349 


r=f(z) 


Fig. 8.15 
EXERCISES 


12.1 Compute the volume of the region between the surfaces z = 27+ y? and 
z2=a+y. 

12,2 Find the volume of the region in E® bounded by the plane z = 0, the cylinder 
22+ y? = 2x, and the cone 2 = +4/22 + y?. 


12.3 Compute J, (z? + y?)? dx dy dz, where A is the region bounded by the plane 
z = 2and the surface z?-+ y? = 2z. 


12.4 Compute f4 2, where 
A = {<x,y,2>:227 4+ P+ 22 < a7, 2 > 0,4 > 0,2 > OF. 


12.5 Compute 
| ay? RP 
(+ 45) : 


where A is the region bounded by the ellipsoid 
: 2 


Let o be a nonnegative function (to be called the density of mass in the following 
discussion) defined on a domain D in E®. The total mass of < D, p> is defined as 
M= f o(a) de. 


If M = 0, the center of gravity of <D, p> is the pointC = <1, C2, C3>, where 


350 INTEGRATION 8.12 


12.6 A homogeneous solid (where p is constant) is given by «1 > 0,22 > 0,23 > 0, 
and 


fe ee 
at pet ga Sh 


Find its center of gravity. 

12.7. The unit cube has density p(x) = 2123. Find its total mass and its center of 
gravity. 

12.8 Find the center of mass of the homogeneous body bounded by the surfaces 
a® y+ 2? = at and 7 + y? = az. 


The notion of center of mass ean, of course, be defined for a region in a Euclidean space 
of any dimension. Thus, for a region 2 in the plane with density p, the center of mass 
will be the point <zo, yo>, where 


Sp zp Sp ye 
x ==— ~~ and =: 
oop e Soe 


12.9 Let PD be a region in the xz-plane which lies entirely in the half-plane zr > 0. 
Let A be the solid in E® obtained by rotating D about the z-axis. Show that u(A) = 
2x du(D), where d is the distance of the center of mass of the region D (with uniform 
density) from the z-axis. (Use cylindrical coordinates.) This is known as Guldin’s rule. 


Observe that in the definition of center of gravity we obtain a veetor (Le., a point 
in E*) as the answer by integrating each of its coordinates, This suggests the following 
definition: Let V be a finite-dimensional vector space, and let ¢),...,¢ be a basis 
for V. Calla map f from E* to V (fis a vector-valued function on E* with values in V) 
contented if when we write f(x) = >> fi(x)e:, each of the (real-valued) functions fi 
is contented. Define the integral of f over D by 


fr-xu(fa)e 


12.10 Show that the condition that a function be contented and the value of its 


integral are independent. of the choice of basis ¢1,... , . 
D 

Let ¢ be @ point not in the closed domain D, which has a 
mass distribution p. The gravitational force on a particle of 
unit: mass situated at £ is defined to be the vector S, 

p(x){x — &) 

~————— dr 

i: IIx — él 

(here x — # is an E®-valued function on E%). Fig. 8.16 


12.21 Let D be the spherical shell bounded by two eaneentrie spheres 8S; and S2 
(Fig. 8.16), with center at the origin. Let » be a mass distribution on D which depends 
only on the distance from the center, that is, p(x) = f({||x||). Show that the gravita- 
tional force vanishes at any é inside $1. 

12.12 <P, p> is asin Exercise 12.5. Show that the gravitational force on a point 
outside Se is the same as that due to a particle situated at the origin and whose mass 
is the total mass of D. 


8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 351 


13. ABSOLUTELY INTEGRABLE FUNCTIONS 


Thus far we have been dealing with bounded functions of compact support. In 
practice, we would like to be able to integrate functions which neither are 
bounded nor have compact support. Let f be a function defined on E”, let Af be 
a nonnegative real number, and let A be a (bounded) contented subset of E”. 
Let f be the function 
if +A, 
fils) = \M if xé€A and /f(x}| > M, 
f(x) if «eA and |f(z)| < M. 


Thus f% is a bounded function of compact support. It is obtained from f by 
cutting f back to zero outside A and cutting f back to Mf when [f(x)| > M. 
We say that a function fis absolutely integrable if 


i) rd is a contented function for all M > 0 and contented sets 4; and 


ii) for any € > 0 there is a bounded contented set A, such that e4,-f is 
bounded and for all M > 0 and all B with BN A, = @, 


[lel <e. 


It is easy to check that the sum of two absolutely integrable functions is again 
absolutely integrable. Thus the set of absolutely integrable functions forms a 
veetor space. Note that if f satisfies condition (i) and |f(z)| < |g(x)| for all z, 
where g is absolutely integrable, then f is absolutely integrable. 

Let f be an absolutely integrable function. Given any €, choose a correspond- 
ing A,. Then for any numbers Mf, and Mz > maxzea, |f(z)| and for any sets 
A, > A, and Ay 2 Aw 


| [rk — {i#2| < J (fra +f ewe 


If we let € > 0 and choose a corresponding family of A,, then the above inequal- 
ity implies that the lim Jf4, is independent of the choice of the A.. We define 
this limit to be Jf. 

We now list some very crude sufficient eriteria for a function to be absolutely 
integrable. We will consider the two different causes of trouble—nonbounded- 
ness and lack of compact support. 

Let f be a bounded function with fa contented for any contented set A. 
Suppose | f(x)| < Cl|z||—* for large values of [Il]. Let B, be the ball of radius r 
centered at the origin. If r, is large enough so that the inequality holds for 
llc, = r,, then for rz > r,; we have 

[\feaonl S Cf, fel * = Conf tr, 


ro —Bry 


< 2e, 


where 2, is some constant depending on n (in fact, it is the “surface area” of the 
unit sphere in E"), If k > n, this last integral becomes 


tn = ot, 


352 INTEGRATION $.13 


which is < [Cl,/(k — n)]ri*, which tends to zero as 71) — » ifk > x. Thus 


we can assert: 


Let f be a bounded function such that f4 is contented for any contented 
set A. Suppose that |f(x)| > @ as ||z\|| + oo in such a way that [[z||*[ f(x] is 
bounded for some k > ». Then f is absolutely integrable. 


Now let us examine the situation when f is of compact support but un- 
bounded. Suppose first that there is a point zg such that f is bounded in the 
complement of any neighborhood of zg. Suppose, furthermore, that 


[f(x)| < C*lla — xoli-* 


for some constants C and k. Then if |f(x)| > AY, |[x — zoll-* > M/C or 
iz — xoli < C/Ar!*, 

Let B, be the ball of radius Ci47'/* centered at zp. Then [f(z)| > M, 
implies that « € B,. Furthermore, for Af, > M, we have 


fe < cf ile — 2oll! + Meu(Bo, 
B, B,—Be 
where Bz is the ball of radius CAZ7!* centered at zy. Thus 
Cale 
f |p| < Conf af yh—i-k dr + AMC Vas, 
By CM; 
where 2, and V, depend only on x. If k < 2, the integral on the right becomes 


CSS" pack k—nyfk ae ey 
(Ms — MgO) < = Met 


Thus 
ie [pe] < const (Myr-mie + mMe—isy 


which can be made arbitrarily small by choosing df, large. 

Thus if f has compact support and is such that f is contented for all M and 
|f(x)| < Cllz — xol|7* with k < x, then f is absolutely integrable. 

More generally, let S be a bounded subset of an /-dimensional subspace of 
IE". Let d(x) denote the distance from z to S. Let f be a function of compact 
support with f* contented for all Mf. If |f(z)| < C d(x)—* wih k<in—J, 
then f is absoluiely integrable. The proof is similar to that given above and is left 
to the reader. 

Let {f,} be a sequence of absolutely integrable functions. Under what 
conditions will the sequence Ife — ff if the sequence f,(z) — f(x)? Even if the 
sequence converges uniformly, there is no guarantee that the integrals converge. 
For instance, if f; = (/kepk, then |f,{x)| < 1/k", so that f, approaches zero 
uniformly. On the other hand, ffi = | forall k. 


8.13 ABSOLUTELY INTEGRABLE FUNCTIONS 353 


We say that a set of functions {f;} is uniformly absolutely integrable if for 
any € > © there is an A, which can be chosen independently of k such that 
f 2" <é forall M 

B 


| 
wherever BNA, = G. 

We frequently verify that {f,} is uniformly absolutely integrable by showing 
that there js an absolutely integrable function g such that |fz(x)| < |g(e)| for 
all & and x. 

Let {f;} be a uniformly absolutely integrable sequence of functions. Suppose 
that f; — f uniformly. Suppose in addition that fis absolutely integrable. Then 
Sf — ff. In fact, for any 6 > 0 we can find a kg such that (f(x) — f(x)| < 8 
for all A > ky and all r. We ean also find A, and A¥, such that 


| {i — fil | fila — fal + [fel — fat [fa — foe 
Setet oulAd), 


which can be made arbitrarily small by first choosing ¢ small (which then gives 
an A.) and then choosing 6 small (which means choosing ky large). 

The main applications that we shall make of the preceding ideas will be to 
the problems of computing iterated integrals and of differentiating under the 
integral sign. 


lA 


Proposition 13.1. Let f be a function on R* x R*. Suppose that the set of 
functions {f(x,-)} is uniformly absolutely integrable, where x is restricted 
to lie in a bounded contented set K C R*. Then the function exert: f is 
absolutely integrable, and 


a= fs [fe y) dy dx = fi [fe y) dr dy. 
Proof. By assumption, for any € > 0 we can find M and A, CR’ such that 


filf@l<e if ANA =@. (13.1) 
Now for any set B in R”, 


M 
fencailf | 


fen) flee, 
uMAe if BOKX A= Z. 


This shows that exye! f is absolutely integrable on R”. Now choose a sufficiently 
Jarge (] = (1 X Ce and an Af such that 


Weak fovart®| : ‘ 


IA 


and also such that 
| fre — frF@,o|<e forall zeK. 


3a4 INTEGRATION 8.13 


Then we have 


Meseat? = Sever | —* 


and 
Fret = fee fat @ v dy de 
Thus 
erent? ~ fe frie v dy de] < e+ ule, 
so that 


fad = a [ise Y¥) dy dx. 


Finally Eq. (13.1) shows that the function F(y) = fx f(, y) is absolutely 
integrable. In fact, using the same A and M as in (13.1), we get 

fire'| < wie. 
Thus we get 


Lat fe fate y) dy dx = fi [fe y) dz dy. O 


An extension of the same argument shows the following. 


Proposition 13.2. Let f be absolutely integrable on R” and such that the 
functions f(z, -) are uniformly absolutely integrable for cach x € R*. Then 


[t= ff tee v) dy ae. 


We now turn our attention to the problem of differentiating under the 
integral sign. 


Proposition 13.3. Let (t,x) F(t,x) be a function on J xX R*, where 
F = [a, 5] CR. Suppose that 

i} F and éF/dt are continuous functions on 7 kK R”; 

ii) (@F'/02}C, -) is a uniformly absolutely integrable family of functions; 
iii) FU, -} is absolutely integrable for all é € T. 

Let f() = JF,-). Then f is a differentiable function of ¢ and 


iW = f, @F/ant, >. 
Proof. Let Gt) = Sn (OF /at)(é, -). Then G(2) is continuous; hence we can pass 


to the limit under the integral sign of a family of absolutely integrable 
functions. Furthermore, 


[ G(s) ds = ee (aF /dt)(s, -) ds 


8.14 PROBLEM SET: THE FOURIER TRANSFORM 355 
by Proposition 13.1. Thus 
t 
f @® = f,,69 -F@)) = [PGI - [FG =10 — fo. 


Differentiating this equation with respect to i gives the desired result. 1 


Finally, let us state the change of variables formula for absolutely integrable 
functions. 


Let g: U — V bea differentiable one-to-one map with differentiable inverse, 
where U and V are two open sets in R". Let f be an absolutely integrable 
function defined on V. Then (f © ¢)|det J,] is an absolutely integrable fune- 
tion on U and 


[f= [ Pe eldet Fo. 


Proof. To show that (fe ¢)|detJ,| is absolutely integrable, let ¢ > 0 and 
choose an A, C V such that (ii) holds. Then A, is compact, and therefore so is 
¢ (A,). In particular, »~!(A,) is a bounded contented set and |det J,| is 
bounded on it. If BN ¢1{A,) = @, where B C U is bounded and contented, 
then 


f,Metslgoo"l sf ii<e 
B 


g(B) 


This shows that (f° ¢)|det J,| is absolutely integrable. The rest of the proposi- 
tion then follows from 


f= Lora, Fo Mldet Fel 
by letting « > 0. 0 


EXERCISES 


13.1 Evaluate the integral [“, e-*” dz. [Hint: Compute its square.] 

13.2 Evaluate the integral fo° e-**x** de. 

13.3 Evaluate the volume of the unit ball in an odd-dimensional space. [Hini: Ob- 
serve that the Jacobian determinant for “polar” coordinates is of the form r*—! &X f, 
where f is a function of the “angular variables”. ‘Thus the volume of the unit ball is of 
the form Cf} r*—! dr, where € is determined by integrating f over the “angular vari- 
ables”, Evaluate C by computing (f%,, e-*” dz)*.} 


14. PROBLEM SET: THE FOURIER TRANSFORM 


Let a = <ayj,...,a@,> be an mtuple whose entries are nonnegative integers. 
By D* we shall mean the differential operator 
y+ -bap 
p*=-2 
axq! ++ Gute 


356 INTEGHATION 8.14 


Let jal = ay +--+ +an. Let Q(z, D) = Yiaici Ga(x)D° be the differential 
operator where each a, is a polynomial in x. Thus if f is a C*-function on R’, 
we have 


(Qf(z) = So ag(x) D(z). 
lalSk 


For any f which is C® on R” we set 
[fle = sup [Of(a)|- 
zER" 
We denote by $ the space of all f € C” such that 
Iflle < « (14.1) 


for all @. To see what this means, let us consider those Q@ with k = 0. Then 
(14.1) says that for any polynomial a(-) the funetion @- f is bounded. In other 
words, f vanishes at infinity faster than the inverse of any polynomial; that, is, 
lim |[z||"f(@) = 0 
izil—oe 
for all p. To say that (14.1) holds means that the same is true for any derivative 
of f as well. 
Tf f is a C%-function of compact support, then (14.1) obviously holds, so 
f €8. A more instructive example is provided by the function 2 given by 


n(x) = el a 


—;2 


Since lim,_,. *?e 7 = 0 for any p, it follows that limpzi_. a(z)n(z) = 0. On 
the other hand, it is easy to see (by induction) that D°n(z) = P.(x)n(z) for 
some polynomial P,. Thus Qn(x) = Pe(x)n(x), where Pg is a polynomial. 
Thus 7 € &. 

It is easy to see that the space § is a vector space. We shall introduce a 
notion of convergence on this space by saying that f, — f if for every fixed Q, 


Ilfn — Flle > 9. 


(Note that the space $ is nof a Banach space in that convergenee depends on an 
infinity of different norms.) 


EXERCISES 


14.1 Let ¢ be a C*-function which grows slowly at infinity. That is, suppose that 
for every a there is a polynomial P, such that 
| Dey(x)| < P(x) for all x. 


Show that if f € S, then pf © $. Furthermore, the map of § into itself sending f — of 
is continuous, that is, if f, 2 f, then of, > ¢f. 


8.14 PROBLEM SET: THE FOURIER TRANSFORM 357 


For « = <z!,...,2"> ER* and § = <£),..., &* > ER"* we denote the value 
of ata by 
(a, &) = vlgl aes + ange, 
Also for anya = <a!,...,a”> and any « & R* we let 
palate Sa hi a 
and similarly &* = (£?)*1--- (£*)s, ete. : 
For any f € § we define its Fourier transform f, which is a function on R™, by 
h® = f oO 3) de, 
We note that 
jo = fs and eos fist 
14,2. Show that f possesses derivatives of all orders with respect to ~ and that 
DEf(e) = (—oy'! f ee P2°V(e) dz; 

in other words, ‘ 

DEO = 90), 
where g(x) = (—i)!#zaf(z), 

14.3 Show that a ; 

—s =7 oe 

55 @ — HO. 
[Hini: Write the integral as an iterated integral and use integration by parts with 
respect to the jth variable.) 


14,4 Conclude that the map f+ f sends $(IR") into $(R*) and that if f, > 0 in 8, 
then fj, > 0 in $(R**), 


14.5 Show that gk 3 
Tof(E = e+ PFE) for any wo & R*. 


Recall that Tu f(z) = fle — w). 


14.6 ForanyfeSdefinefby _ 
f(z) = f(—2), 


where denotes complex conjugation. Show that. 


F®) =O. 


14.7 Let = 1, and let f be an even real-valued function of 2. Show that 
He) = f cos (x, £) Aa) de. 
14.8 Let n(x) = e- 4/2? where x © R!. Show that 


di 
a 


log #(£) = —4£+ const, 


(f) = —én(£), 


and conclude that 


358 INTEGRATION 8.14 


so that 
a(t) = const X e7(1/28?, 
Evaluate this constant as ¥/2r by setting £ = 0 and using Exercise 13.1. Thus 
ACE) = 4/2a eH, 


14.9 Show that the limit lim,so f2/ (sin x)/x dx exists. Let us call this limit d. 
Show that for any R > 0, lim,4o fi‘ (sin Rx)/x dx = d. 


If f & S, we have seen that f © $(R”*). We can therefore consider the function 
fe@?3@ dt, 


The purpose of the next few exercises is to show that 


fy) = aa | eI) dé. (14.2) 


We first remark that since all integrals involved are absolutely convergent, it 
suffices to show that 


Ry, phy 

* . l 4 ny ay) wee n 

jy) = lim --- lim a | AE Bet oh ae cae 
Ry 0 R98 (2)* -R,, J —R, 


Substituting the definition of f into this formula and interchanging the order of inte- 
gration with respect to z and £, we get 
Ry 


n Rn 
m iUtyl—alyeta... 
er (4) ff = fiz', ... x ett ae +O" OF aed ded, 
Ry +0 Ry,—r0 Pog Ry Ry 


It therefore suffices to evaluate this limit one variable at a time (provided the conver- 
gence is uniform, which will be clear from the proof). We have thus reduced the problem 
to functions of one variable. We must show that if f € $(R4), then 


R 
+ l (u—z 
Ky) = lim 5 i ij Saye" ag de. 
Roe aT —k 
We shall first show that 


R 
f(y) = lim x ff toe dé dx, 


R90 


where d is given in Exercise 14.9. 
14.10 Show that this last integral can be written as 
@ i] 
st sin Ry—2) bi fy—w+fyty Ru 
2 f 5 ae dr = 5 2 sin ~~ du. 


z= 


14.11 Let 


_fy¥—-wotfyty _ 
2 


gu) f@). 


8.14 PROBLEM SET; THE FOURIER TRANSFORM 359 


Show that g(0) = 0 and conclude that g(x) = zh(z) for 0 <4 <1, whereh EC. By 
integrating by parts, show that 


€ 1 
. Ru . Ru 1 
il g(u} sin ri +f a sin — | < const R 


Conclude that 


lim = 
Rw 


This proves that 


‘ Hy OT IO TY) ain Be ay = fy). 


fe) =% i oH) dé. 
14.12 Using Exercise 14.8, conclude that d = 7/2. 
Let f) © § and fe @$. Define the function fi * fo by setting 
fit fale) = [file — v)fol) ay. 


Note that this makes good sense, since the integrand on the right clearly converges for 
each fixed value of 2. We can be more precise, Since f; € 8, we can, for any integer p, 
find a K, such that 


_ Kp 
lfey)| < T+ T+ lvl? ? 


so that 
| If cut 
tyl>k 2(y)| <n 1+ Rk? 
Then 
JO MWA — DFO) dy = fo yey EE MA — WiFole) dy 
f G+ [loll fie — v)fol@) ay. 
Dy D>{1/2) Ke I 


The first integral is at most 
Co(Slz||)"(L + [lel|)* max | fo(2)| 


while the second is at most 


(1+ Hall) max [fi(u)| 


it cer’ 


Lo(fllall)” 
1+ Glel? 


By choosing p > g-+ », we see that both terms go to zero. Thus 
im 7 (1+ [lel fi * fa(z) = 0. 
14.13 Show that 
Ah * fo) = i *fa=fir (22). 


Conclude that fi * fe & 8. 


360 INTEGRATION 8.14 


14,14 Show that if gis any bounded continuous function on R", then 
[feet vf @few) de ay = f ooh * feo) de 


14.15 Conclude that 

ee ig ss 

fi * fo(&) = frlOfots). 
14,16 Show that 


f*Jy) = (3) i pyre? ag. 


14.17 Conclude that for any f € 8, 


{ WP = (ty i if. (14.3) 


(Hint: Set y = 0 in Exercise 14.16.] 


The following exercises use the Fourier transform to develop facts which are useful 
in the study of partial differential equations. We will make use of these facts at the 
end of the last chapter. The reader may prefer to postpone his study of these problems 
until then. 

On the space 8, define the norm [| ||, by setting 


Use = a) f+ HEY? a 
and the scalar product (f, 9), by 
fade = ft lle FQ9@ ab. 


14,18 Lets = & be a nonnegative integer. Show that 


2 R ayy 12 
Whe = 2. oe — lab | [DY a)" de, 
where a! = a !---+a@,!. [Use the multinomial theorem, a repeated application of 
Exercise 14.3, and Eq. (15.3).] 

We thus see that [|f\jz measures the size of f and its derivatives to order R in 
the square integral norm. It is helpful to think of || |}, as a generalization of this notion 
of size, where now s can be an arbitrary real numbcr. 

Note that 

Ife S Yfke if st 


For any real s define the operator K* by setting 
RHE) = (1+ WelHKo. 
14,19 Show that the operator K = K! is given by 


3 
R= f- Doe 


8.14 PROBLEM SET: THE FOURIER TRANSFORM 361 
14.20 Show that for any real numbers s and #, 
WKeflie = [lfileaos 


(Kf, gt baal (f, Kg) Lo (f, Qe-4 te 
14.21 Show that K*+' = K*o Kt, so that, in particular, A* in invertible for all s. 


and 


We now define the space H, to be the completion of S under the norm || {/,. The 
space H, is a Hilbert space with a, scalar product ( , );. We ean think of the elements 
of H, as “generalized functions with generalized derivatives up to order 3”. By con- 
struction, the space 8 is a dense subspace of H, in the norm || ||, We note that Exer- 
cise 14.20 implies that the operator A* can be extended to an isometric map of H; into 
Hy_2;. We shall also denote this extended map by A°. By Exercise 14.21, 


Ko; Aya, _> A, 


is the inverse of K*, so that K* is a norm-preserving isomorphism of //; onto H,—2.. 
14.22 Let u GH, and» € H_,. Show that 
|(2, ®o] S livlfs|leil ~. 

Thus we can extend <u,» > — (u,¥)o to a function on H, X H_, which is linear in 
u and antilinear in v [that is, (2, avr + bova)o = Glu, #1) + Blu, v2)] and satisfies the 
above inequality. Thus any v € H_, defines a bounded linear function, ?, on H, by 
au) = (u, vo. 

14.23 Conversely, let 2 be a bounded linear function on H,. Show that there is a 


ve H_, with l(u) = (u,v)o for all w& Hs. [Hint: Consider the linear form » = K*w, 
where w is a Suitable element of H;, using Theorem 2.4 of Chapter 5.] 


14.24 Show that 
lle} = sup (te, vol 7 


scH, lulls 
+0 


(Exercise 14.22 gives an inequality. If » = 0, take « = K-*v to get 
llok—slleells = (K-72, K-*/2n) = jell, 


in order Lo get an equality.) 
14,25 Let 2s > n (where our functions are defined on R*). Show that for any fE§& 
we have 


sup |f(2)| < Uslle[ f+ II?) “]!? de Sobolev's inequality). 


(Use Eq. (14.2), Schwara’s inequality, and the fact that the integral on the right of 
the inequality is absolutely convergent.) 


Sobolev’s inequality shows that the injection of § into C(IR*) extends to a continuous 
injection of H, into C(IR”), where C(R") is given the uniform norm. We can thus 
regard the elements of H, as actual functions on R* if s > 2/2. 


362 INTEGRATION 8.14 


By induction on |e| we can assert that for s > n/2, any f € H\4)<. has |al con- 
tinuous derivatives and 


pup |D°f(z)| < Call fllai+e- (14.4) 


14,26 Let 2 be a bounded open subset of R®. Let ¢ € § satisfy supp ¢ C2. Show 
that 
le] S u(Q)'llpllo for all &. 


14.27 Let dé = lubseg |x], and let de = (d!)e1--+ (d*)}%, Show that 
[Dep << d%y(Q)¥? l¢l)o. 
14.28 Show that 
[F6B(E)| < w(Q)/?| Degliy, 
[1+ UNEl?)*e7(8] S aD lel. 


14.29 More generally, let ¥ be a function in $ which satisfics ¥(%) = 1 for all « EQ, 
and let » € § satisfy supp ¢ C2. Show that 


la(H)| = |(y, #0] S MlellellWell—e, 
where ¥e(2) = ¥(x) eo, and that 
| Die()| < llellellWFll—s, 


and conclude that 


where Wi(x) = 2p (aye 828, 


Let us denole by Hf? the completion under |] ||. of the space of those functions in & 
whose supports lie in @. According to Exercise 14.29, any ¢ © H? defines an actual 
function ¢ of £ which is differentiable and satisfies 


LDEPCEN S [lollelle@ ll. 


where {}¥#(z)|_, depends only on Q, a, &, and ~s, and is independent of g. Further- 
more, [lolly = (1 + WEil*)|9(8)/? ae. 

14.30 Let s < ¢. Then the injection H,— H, is a compact mapping. That is, if 
{oi} is a sequence of elements of H? such that [l¢,|!, < 1 for all 7, then we can select a 
subsequence {yi} which converges in || ||s. [Hint: By Exercise 14.29, the sequence ut 
functions :(£) is bounded and equicontinuous on {£: ||£I] S 7} for only fixedr. We 
can thus choose a subsequence whieh converges uniformly and therefore a subsubse- 
quence which converges on {£: ||£|| <r} for all r (the uniformity possibly depending 
onr}. Then if {gi} is this subsubsequence, 


lle — vids = f+ NAP) Wes) — ey OP ak 
=| (1+ HEI?) Wes, — ee, 1? ae 
eli y 
+f 1+ WEN We) ~ 94,0 at 
{ze il>r 


< sence) — 2s, @P at 
tL} iP les Al? A+ eal} 


CHAPTER 9 


DIFFERENTIABLE MANIFOLDS 


Thus far our study of the calculus has been devoted to the study of properties 
of and operations on functions defined on (subsets of) a vector space. One of 
the ideas used was the approximation of possibly nonlinear functions ai. each 
point by linear functions. In this chapter we shall generalize our notion of space 
to include spaces which cannot, in any natural way, be regarded as open subsets 
of a, vector space. One of the tools we shall use is the “approximation” of such a 
space at each point by a linear space. 

Suppose we are interested in studying functions on (the surface of) the unit 
sphere in E*, The sphere is a two-dimensional object in the sense that we can 
describe a neighborhood of every point of the sphere in a bicontinuous way by 
two coordinates, On the other hand, we cannot map the sphere in a bicontinuous 
one-to-one way onto an open subset of the plane (since the sphere is compact and 
an open subset of E” is not). Thus pieces of the sphere can be described by open 
subsets of E?, but. the whole sphere cannot. Therefore, if we want to do caleulus 
on the whole sphere at once, we must introduce a more general class of spaces 
and study functions on them. 

Even if a space can be regarded as a subset of a vector space, it is conceivable 
that it cannot be so regarded in any canonical way. Thus the state of a (homoe- 
geneous ideal) gas in equilibrium is specified when one gives any two of the three 
parameters: temperature, pressure, or volume. There is no reason to prefer any 
two to the third. The transition from one set of parameters to the other is given 
by a one-to-one bidifferentiable map. Thus any function of the states of the 
gas which is a, differentiable function in terms of one choice of pararneters is 
differentiable in terms of any other. Thus it makes sense to talk of differentiable 
functions on the states of the gas. However, a function which is linear in terms 
of one choice of parameters need not be linear in terms of the other. Thus it 
doesn’t really make sense to talk of near functions on the states of the gas. 
In such a, situation we would like to know what properties of functions and what 
operations make sense in the space and are not artifacts of the description we 
give of the space. 

Finally, even in a vector space it is sometimes convenient to introduce 
“nonlinear coordinates” for the solution of specific problems: for example, polar 
coordinates in Exercises 11.3 and 11.4, Chapter 8. We would therefore like to 
know how various objects change when we change coordinates and, if possible, 
to introduce notation which is independent of the coordinate system. 


363 


364 DIFFERENTIABLE MANIFOLDS 9.1 


We will begin our formal discussion with the definition of differentiable 
manifolds. The basic idea is similar to the one that is used in everyday life to 
describe the surface of the earth. One gives a collection of charts describing small 
overlapping portions of the globe. We can piece the whole picture together by 
seeing how the charts match up. 


1. ATLASES 


Let M bea set. Let V be a Banach space, (For almost all our applications we 
shall take V to be R® for some integer x.) A V-ailas of class C* on M is a collec- 
tion @ of pairs (U;, ¢;) called charis, where U; is a subset of M and ¢;, is a bijec- 
tive map of U; onto an open subset of V subject to the following conditions 
(Fig. 9.1): 
Al. For any (U;,¢) €@ and (U;,¢;)€@ the sets o(U;M U;) and 
¢;(U; n U;) are open subsets of V, and the maps 


93° g7 9400 UZ) > ¢(Ui nU) 
are differentiable of class C*. 
A2, YU; = M. 


The functions ¢;° gj are called the transition functions of the atlas @. 
The following are examples of sets with atlases. 


Example 1. The irtviel example. Let M be an open subset of V. If we take @ 
to consist of the single element (U, ¢), where U = M and ¢:U — V is the 
identity map, then Axioms Al and A2 are trivially fulfilled. 


Example 2, The sphere. Let M = S" denote the subset of R"+! given by 
(w!)? +---+ (2"+1)* = 1. Let the set U, consist of those points for which 
2"+! > —1, and let U2 consist of those points for which 2” <1. Let 


gi: U, > R® 


9.1 ATLASES 365 


be given by 
i 
; zr : 
yee e',....2°9") = Pea t=1,...,%, 
where y!,..., y” are coordinates on R”. Thus the map ¢; is given by the 
projection from the “south pole”, <0,...,0,--1>, to R” regarded as the 


equatorial plane (sce Fig. 9.2). Similarly, define gs by 


a 


Then 9 (UN U2) = g2e(U, 0 Ue) = fy ER": y +O}. Now 


ely 8 ve id) 2 
Eo! ee%et,...,2) = SPP) 


t 1 1 
yo go(a',... 27) = 


i (®t)? ght 
(4 ath)? ~ [> geri 


Thus 
y oy (x? gt) a yee eit’, tees e) ; 
ae ves L lve vila}, ..., arty? 
or 
¢1(z) 
g2{x) = : 
Ilea (a) ||? 


In other words, the map ¢2 ° ¢]‘, defined for all y 0, is given by 
g2° v1 (y) = Te 
Thus conditions Al and A2 are fulfilled. 


zl x 
<n werd tant! 0 


are A 
w” 
dal “arly... ot tl> 


Fig. 9.2 


Note that the atlas we gave for the sphere contains only two charts (each 
given by polar projection). An atlas of the earth usually contains many more 
charts. In other words, many different atlases can be used to describe the same 
set. We shall return to this point later. 


366 DIFFERENTIABLE MANIFOLDS 9.1 


A 


Fig. 9.3 Fig. 9.4 


Example 3. The circle. The circle S' isa “one-dimensional sphere” and therefore 
has an atlas as described in Example 2. We wish to describe a different atlas 
on S*. Regard S' as the unit circle z? + 22 = 1, and consider the function 6), 
defined in a neighborhood of <1,6> on the upper semicircle of S', which gives 
the angle from the point on S' to <1,0> (see Fig. 9.3). As we move counter- 
clockwise around the circle, this function is well defined until we hit <1,Q> 
again. We will take, as the first chart in our atlas, (U/,, @;), where U; = 
S! — {<1,0>} and @, is the function defined above. Let U, = S' — {<0,1>}, 
and define @) to be 7/2 plus the angle (measured counterclockwise) from 
<0,1> (see Fig. 94). Now U;n Ug = 8S! — {K1,0>, <0,1>}, and 
0,(U; N U2) = (0, 2m) — {1/2}. 


AL t's) 
——————- r—_——-_ 
(} af2 Qa 
Also, 62(U, M Us) = (w/2, 24 + 2/2) — {27}. 
Byl Uinl's} 
u afl 2a In+a/2 


The map 62 ° @7' is given by 
z+ 29 if O<« < 7/2, 
x if 7/2 <4 < 27. 


Example 4. The product of two atlases. Let @ = {(Ui, ¢i)} be a V-atlas on a set 
M, and let @ = {{W;,¥;)} be a Vz-atlas on a set N, where V; and V2 are 
Banach spaces. Then the collection € = {(U; * Vj, o; X ¥;)} isa (Vi X V2)- 
atlas on M XN. Here ¢; X ¥i(p, 2 = <eilp), ¥i(Q> if <p,g> EU; x W;. 
It is easy to check that © satisfies conditions Al and A2. We shall cal] @ the 
product of @ and @ and write ©C = @ X &. 

For instance, let M = (0,1) CR!’ and N= 8S. Then we can regard 
M XN asa cylinder or an annulus. If M = N = S!, then M x N is a torus. 


82° Oy (a) = 


Cylinder Annulus 


9.2 FUNCTIONS, CONVERGENCE 367 


It is an instructive exercise to write down the atlases and transition functions 
explicitly in these cases. 


Example 5. As a generalization of our first example, let S be a submanifold of 
an (n + m)-dimensional vector space X, as defined in Section 12 of Chapter 3. 
For each neighborhood N defined there, the set S 7 N, together with the map ¢ 
which is defined as the projection 7, restricted to S, provides a chart with 
values in V (where X is yiewed as V x W). In such a neighborhood W the set S 
is presented as a graph of function F. In other words, 


SON = (<x, F(a)> VX Wire m(S}, 


where F is a smooth map of A = 7,;(5 9 N) into W. Let N’ be another such 
neighborhood with corresponding projection 1} (where now X is identified with 
V X W in some other way). Then ¢’ o g—}(x) = mj (xz, F(x)), which shows 
that ¢’ ¢ g—!isa smooth map. Thus every submanifold in the sense of Chapter 3 
possesses an atlas. 


Exercise. Let P” (projective n-space} denote the space of all lines through the origin 
in R**!_ Any such line is determined by a nonzero vector lying on the line. Two such 
vectors, <z},...,z"*+1> and <y!,...,y"t!>, determine the same line if and only 
if they differ by a factor, that is, y' = Az‘ for all 7, where \ is some (nonzero) real 
number. We can thus regard an element of P” as an equivalence class of nonzero 
yectors. For each 7 between 1 and n+ 1, let U; Cc P” be the set of those clements 
coming from vectors with x‘ ~ 0. Map 


U; —> R’* 
by sending 


1 1 i+ 1 
1 n+l x a gt ot 
M@,...,2 > Ss bof SES ae a . 
a x x x 


Show that the map a; is well defined and that {(U;, a,}} is an atlas on P*. 


2. FUNCTIONS, CONVERGENCE 


Let @ be a V-atlas of class C* ona set M. Let f be a real-valued function defined 
on M. For a chart (U;, ¢;) we obtain a function f; defined on 9;(U;) by setting 
fi=feogr’. (2.1) 
The function f; can be regarded as the “local expression of f” in terms of the 
chart (U;, ¢:). In general, the functions f; will look quite different from one 
another. For example, let Jf = 5S”, let @ be the atlas described, and let f be the 
function on the sphere assigned to the point <a!,...,2"t!> the value "+". 
Then 9 
= ° mans =e 1 

fil) = fo er’) 1+ Ie? ? 
while 
i 


oly) = forty) =1 
as one can check by solving the equations. 


oes ee 
1+ |lyll? 


368 DIFFERENTIABLE MANIFOLDS 9.2 


Returning to the general discussion, we observe that the functions f; are 
not completely independent of one another. In fact, it follows from the defini- 
tion (2.1) that we have 


fiogicgy' =f; on 9(U;N U5). (2.2) 


[Thus in the example cited above we indeed have fo(y) = fi(y/Ily']7), as is 
required by (2.2).] 

We now come to a simple but important observation. Suppose we start 
with a collection of functions {f;}, each f, defined on ¢;({U/,), and such that (2.2) 
holds. Then there exists a unique function f on M such that f; = fog;*. In 
fact, define f by setting f(p) = f:(y,(p)) if p © U;. For f to be well defined, we 
must be sure that this definition is consistent, ie., that if p is also in U;, then 
f:letp)) = f;(vi(p)), but this is exactly what (2.2) says. 

We can thus think of a real-valued function in two ways: as either 

i) an object defined invariantly on M, i.e., a map from Af to R, or 


ii) a collection of objects (in this case functions) one defined for each chart 
and satisfying certain “transition laws”, namely (2.2). 


This dual way of looking at objects on AM will recur quite frequently in what 
follows. 

Let M be a set with an atlas of class C*, We will say that a function f is of 
class C’ (1 < k) if each of the functions f; defined by (2.1) is of class C’. Note 
that since i < k, this can happen without any interference from (2.2). If f; @C* 
and g;'« ¢; €C* (k > D, then f;o (gy! ov} eC". If I were larger than k, 
then in general f; would not be of class C’ if f; were, and there would be very 
few functions of class C. 

Since we will not wish to constantly specify degrees of differentiability of 
our atlas, from now on when we speak of an atlas we shall mean an atlas of class C®. 

Let Mf be a set with an atlas @. We shall say that a sequence of points 
{x; € M} converges to r € M if 


1) there exists a chart (U;, ¢,) € @ and an integer N; such that r € U,; and 
for all k > N, a € U3; 


li) gi{tn)aon converges to v(x). 


Note that if (Uj, ¢;) is any other chart with « & U;, then there exists an N; 
such that 9;(2;,) € U; for k > Nj and (xx) > ¢;j{z). In fact, choose N; 80 
that y:(z.) € o(U; n U;) for all k > N;. (This is possible since ¢,(U; NM U;) 
is open by Al.) The fact that the ;(x,) converge to 9;{x) follows from the 
continuity of g; ¢ ¢; }. It thus makes good sense to say that {x,} converges to x. 


Warning. It does not make sense to say that a sequence {z,} is a Cauchy 
sequence. Thus, for example, let 4f = S” with the atlas deseribed above. If {x,} 
is a sequence of points converging to the north pole in S”, then ¢(7;) — 0, 
while ge{r,) — «. This example becomes even more sticky if we remove the 
north pole, i.e., let AE = S* ~ {<0,...,0,1>} and define the charts as before. 


9.3 DIFFERENTIABLE MANIFOLDS 369 


Then {x,} has no limit (in Af). Clearly, {y1(cx)} is a Cauchy sequence, while 
{v2(r%)} is not. 


Once we have a notion of convergence, we can talk about such things as 
open sets and closed sets. We could also define them directly. For instance, a set 
U is open if ¢(U m U;) is an open subset of ¢;(U;) for all charts (U;, ¢,), and so on. 


EXERCISES 


2.1 Show that the above definition of a set's being open is consistent, i.e., that there 
exist noncmpty open sets. (In fact, cach of the U,’s is open.) 

2.2 Show that a sequence {x,.} converges to x if and only if for every open set U 
containing z there isan Ny with c, € U fore > Ny. 


Let @ = {(U;, ¢;)} be an atlas on Mf, and let U be an open subset of J relative 
to this atlas. Let @ [ U be the collection of all pairs (U; NU, 9; [ U). It is 
easy to check that @ [ U is an atlas on U. We shall call it the restriction of 
ato U. 

Let f be a function defined on the open sect U. We say that f is of class C* 
on U if it is of class C' relative to the atlas @ [ U on U. For later convenience 
we shall say that a function f defined on a subsct of M is of class C* if 


i) the domain of f is some open set U of M7, and 
ii) f is of class C’ on U. 


$. DIFFERENTIABLE MANIFOLDS 


In our discussion of the examples in Section 1, the particular choice of atlas that 
we made in each case was rather arbitrary. We could equally well have intro- 
duced a different atlas in each case without changing the class of differentiable 
functions, or the class of open sets, or convergent sequences, and so on. We 
therefore introduce an equivalence relation between atlases on MW: 


Let @; and @z be atlases on M. We say that they are equivalent if their 
union @; U @2 is again an atlas on M. 


The crueia! condition is that Al still hold for the union. This means that 
for any charts (U7, 9) €@, and (W;,¥;) € @2 the sets 9,(V;N W,) and 
¥;(Ui 1 W;) are open and 9; ¢ ¢; ' is a differentiable map of ¥,(U; A W;) onto 
of(U; 1 W;) with a differentiable inverse. 

Tt is clear that the relation introduced is an equivalence relation. Further- 
more, it is an easy exercise to check that if f is a function of class C? with respect 
to a given atlas, it is of class C’ with respect to any equivalent one. The same is 
true for the notions of open set, and convergence. 


370 DIFFERENTIABLE MANIFOLDS 9.3 


Definition 3.1. A set M together with an equivalence class of atlases on M is 

called a differentiable manifold if it satisfies the “Hausdorff property’’: For any 

two points x) = xz of M there are open sets U,;and U2 with x; © U, andx2€ Ue with 

U;N Us = &. 

In what follows we shall (by abuse of the language) denote a differentiable 
manifold by M, where the equivalence class of atlases is understood. By an 
atlas of M we shall then mean an atlas belonging to the given equivalence class, 
and by @ chart of Af we shall mean a chart belonging to some atlas of M. 

We shall also adopt the notational convention that V is the Banach space 
where the charts on M take their values (and shall say that M isa V-manifold). 
If there are several manifolds, 14;, Af2, ete., under discussion, we shall denote 
the corresponding vector spaces by V1, Vs, ete. If V = R*, we say that M is 
an n-dimensional manifold. 

Let M, and Mo be differentiable manifolds. A map ¢: M, — M¢ is called 
continuous if for any open set Up C M2 the set y—'(U2) is an open subset of Mj. 
Let 22 © Mo, and let U, be any open set containing zo. If (x1) = xe, then 
¢ '(Ue) is an open set containing x,. If (W, a) is a chart about x, then 
W 9 ¢—!(U2) is an open subset of W, and a(W  ¢7!(U2)) is an open set in Vy 
containing @(z,). Therefore, there exists an € > 0 such that g(x) € U2 for all 
xz € W, with |la(x) ~ a(z;)|| <¢. In this sense, ali points “close to x” are 
mapped “close to x2”. Note that the chetce of € will depend on the chart (W, a) 
as well as on x1, 2, Ve, and ¢. 

If 4, M2, and M3 are differentiable manifolds, and if ¢: Af; —~ Mz and 
¥: M_— Mz are continuous maps, it is easy to see that their composition 
yo gis a continuous map from WM, to AZ3. 

Let » be a continuous map from M, to M;. Let (Wy, a,) bea chart on Af 
and {(W.2, a2) a chart on Mz. We say that these charts are compatible (under ¢) 
if g(W,) C We. If G2 is an atlas on M2 and Q, is an atlas on Mj, we say that @, 
and @, are compatible under ¢ if for every (Wy, a) € @, there exists a 
(W2, a2) € @2 compatible with it, ie., such that o(W1) C We. (Note that the 
map a2 ° (¢ f Wy) ° a7? is then a continuous map of an open subsct of V, into 
V2.) Given @z and ¢, we can always find an @, compatible with @_ under ¢. 
In fact, let @4 be any atlas on My, and set 


@ = {((Wine"(W2)), a | Wine" W2))}, 

where (W, «) ranges over all charts of @', and (Wo, 8) ranges over all charts 
of Qo. 

Definition 3.2. Let M, and M, be differentiable manifolds, and let ¢ be a 

map: Mf, *% Mz. We say that ¢ is differentiable if the following hold: 

i) ¢ is continuous. 

ii) Let @, and @z be compatible atlases under y. Then for any compatible 

(Wy, ay) € @, and (We, a) € Ge, the map 
a2 ° ge ay: a1(W1) > ao(We) 


is differentiable (as a map of an open subset of a Banach space into a 
Banach space). (See I’ig. 9.5.) 


9.3 DIFFERENTIABLE MANIFOLDS 371 


Fig. 9.5 


In order to check that a continuous map ¢ is differentiable, it suffices to 
check much less than (ii). Condition (ii) relates to any pair of compatible atlases 
and any pair of compatible charts, In fact, we can assert: 


Proposition 3.1. Let ¢: M@,— Mz be continuous, and let @, and @z be 
compatible atlases under y. Suppose that for every (W1, «;) € @, there 
exists a (Wo, a2) € @y with o(W,) C We and a2°¢°a;! differentiable. 
Then ¢ is differentiable. 


Proof. Let (Uy, 8y) and (U9, Be) be any charts on Af, and Af, with e(U,) C U2. 
We must show that 82° » ° 67! is differentiable. It suffices to show that it is 
differentiable in the neighborhood of every point A(x), where xz, € U,. Choose 
(Wi, a1) € @, with cE Wy, and choose (W2,a)) E€ Gz with o(W,) C We. 
Then on 6,(W; MN U4), we have 


B20 ¢° By! = (Bg 0 az") o (ago po ay’) (a ° f7'), 
so that the left-hand side is differentiable. 0 


In other words, it suffices to verify differentiability with one pair of atlases. 
We have as a consequence: 


Proposition 3.2. Let ¢:M,— Mz, and ¥:Mz— M3 be differentiable. 
Then y¥ o ¢ is differentiable. 


Proof. Let Gg be an atlas on M3. Choose Gp compatible with @3z under y, 
and then choose an atlas @; on M, compatible with @2 under ¢. For any 
(Wy, a) © @, choose (We, a2) € Ge and (W3, ag) € Gz with ¢(W1) Cc Wea and 
(We) CW. Then ago goa,’ = (age poas')> (a2 goa;') is dif 
ferentiable. 0 


Exercise 3.1. Let My = S", let Mz = P*, and let ¢: M1 — Me be the map sending 
each point of the unit sphere into the line it determines. (Note that two antipodal 


372 DIFFERENTIABLE MANIFOLDS 9.3 


points of S” go into the same point of P”.) Construct compatible atlases for ¢ and 
show that ¢ is differentiable. 


Note that if f is any function on Jf with values in a Banach space, then f is 
differentiable as a function (in the sense of Section 2) if and only if it is differ- 
entiable as a map of manifolds. In particular, let ¢: 4f, — My, bea differentiable 
map, and let f be a differentiable function on Af, (defined on some open subset, 
say U,). Then f ¢ ¢ is a differentiable funetion on AZ, [defined on the open sel. 
g '(U,)]. Thus ¢ “pulls back” a differentiable funetion on Af. to My. From 
this point of view we can say that ¢ induces a map from the collection of differ- 
entiable functions on Jf, to the eollection of differentiable functions on Af). We 
shall denote this indueed map by ¢*. Thus 


. + . * ‘ + . 
differentiable funetions on AJ2 —&» differentiable funetions on JM , 


is given by 
e"[fl= fe. 


If ¥: Afy — AY, isa second differentiable map, then (p ¢ ¢)* goes from functions 
on Me, tu functions on AZ), and we have 


(Yo p)* = ¢*o ¥* (3.1} 
{note the change of order). In fact, for g on Mg, 
Weolg= ge Woo = Gopoe = oe" lg). 


Observe that if ¢ is avy map from Jf, — Af, and f is any function defined 
ona subset S2 of Mo, then the “pullback” »*[f] = f ¢ y is a function defined on 
¢ *(S2) of A¥,. The fact that ¢ is continuous allows us to conclude that if Sz is 
open, then so is y'(S2}. The fact that ¢ is differentiable implies that ¢*[f] 
differentiable whenever f is. 

The map ¢* commutes with all algebraic operations whenever they are 
defined. More precisely, suppose f and g take values in the same vector space 
and have domains of definition U7; and (/,. Then f + g is defined on 01 9 ©. 
and »*|f] + ¢7[g] is defined on g—!(U'1 M U2), and we elearly have 


e*lf+ al — olf] — "lal. 


EXERCISES 


3.2 Let Jf2 be a finite-dimensional manifold, and let @: Jf, > Aflg be continuous 
Suppose that ¢*[f] is differentiable for any (locally defined) differentiable real-valucd 
function f. Conclude that ¢ is differentiable. 


3.3 Show that if y is a bounded linear map between Banach spaces, then y* it 
defined above is an extension of ¢* as defined in Scetion 3, Chapter 2. 


9.4 THE TANGENT SPACE 373 


4. THE TANGENT SPACE 


In this section we are going to construct an “approximating vector space” to a 
differentiable manifold at cach point of the manifold. This will allow usto 
formulate most of the notions of the differential caleulus on manifolds. 

Let M be a differentiable manifold, and let z be a point of M (Fig. 9.6). 
Let J Cc R be an interval containing the origin. Let ¢ be a differentiable map 
of J into Af such that ¢(0) = x. We will call y a (differentiable) curve through z. 

Let f be any differentiable real-valued function on Af defined in a neigh- 
borhood of x. Then ¢*[f] is differentiable on R and we can consider its derivative 
at the origin. Define the operator D, by 


_ de"lf] 

D,{f) "ex dt t=<o 

In view of the linearity of »*, the map 
f— D,(f) is linear: 


D,laf + bg) = aD,(f) -} bDo(g). 

——a —_—-—— 
Similarly, we have Leibnitz’s rule: 0 
D(fg) = f@)Do{9) + ga) Do), Fig- 7-6 


which can easily be checked. The functional D, depends on the curve ¢. If y 
is a second curve, then, in general, D, = Dy. If, however, Dy = Dy, then we 
say that the curves ¢ and y are tangent at x, and we write g ~ ¥. Thus 


e~y if and only if Df) = D,(f) for all differentiable functions f. 


It is easy to check that ~ is an equivalence relation. An equivalence class 
of curves through z will be called a éangent vector at x If £ is a tangent vector 
at x and y € é, we say that ¢ is tangent to ¢ at 2. 

For any differentiable function f defined about x and any tangent vector &, 
we set 


Ef) = D,ff), 


where ¢ € ¢ Thus € gives us a functional on differentiable functions defined 
about 2. We have 


eaf + bg) = ak(f) + 6&9), (4.1) 
E(fg) = f(#)E@) + g(x) EQ). (4.2) 


Let us examine what the equivalence relation ~ says in terms of a chart 
(W, #) about x. The functional D,(f) can be written as 


djee| _ dfeane (ae) 
dt i=0 at i=o 


374 DIFFERENTIABLE MANIFOLDS 9.4 


If we set 6 = ao gand F = fe a™!, then @ isa parametrized curve in a Banach 
space and F is a differentiable function there. We can thus write 


D,(f) = dF (®'(0)) = Deo,F. 


From this expression we see (setting ¥ = aoy) that ¥ ~ » if and only if 
(0) = ¥'(0). We thus see that in terms of a chart (W, a), every tangen{ 
vector ¢ at x corresponds to a unique vector & € V given by 


f= (a > v)’(0)}, 
where ¢ € &. 
tanversely, given any » © V, there is a tangent vector & with & = + 
in fact, define » hy setting (4) = a7! (e{z) + @). Then » is defined in a smal! 
enough interval about 0, and (ao gY = ». 
In short, a choice of chart allows us to identify the sct of all tangent veetor- 
at 2with V. Let (t', 8) be a second chart about z. Then 


Eg = (Be ¢)'{(0) = (82 a") o (ao ¢)’(0). 
By the chain rule we thus have 
fa = J pou-1(a(z)) as (4.33 


where J,{p} is the differential dY¥, of ¥ at p. 

Since Jgoa-r(a(x}} is a near map of V into itself, Eq. (4.3) says that the 
set of all tangent vectors at < ean be identified with V, the identification bein» 
determined up to an automorphism of ¥. In particular, we ean make the set 
of all tangent veetors ab x into a vector space by defining 


age + bn = ¢, 
where ¢ is determined by 
ak, > bag = Se 


for some chart a. Equation (4.3) shows that this definition is independent of «. 
We shal! denote the space of tangent vectors at 2 by 7',(47) and shall cali ¢ 
he tangent space (to AY) at x. 

Let ¥ be a differentiable map of 4/1 to 4¥o, and let ¢ be a eurve passing, 
through x & #7, (sce Pig. 9.7). Then $ © 9 is a curve passing through (1) & Als. 
It is easy to check that if ¢ ~ @, then yoo ~ Yo B Thus the map # induces 2 
mapping of 7',(47,) into ¥y.2)(4/ 2), which we shall denote by yz. To repeat, 


0 Fig. 9.7 


9.4 THE TANGENT SPACE 375 


Fig. 9.8 Fig. 9.9 


if £ € 7(M),), then ¥4,{£) = 7 is determined by 
yoegey forall peé. 
Let (U, «) be a chart about x, and let (W, 8) be a chart about ¥(z). Then 


Eq = (a ¢ ¢)’(0) 
and 
ng = (Be Po v)'(0) = (Bo Poa) > (ao ¢)'(0). 


By the chain rule we can thus write 


78 = J geyoaa? (a(x)) bq. 


This says that if we identify T,(Af,) with V, via o and identify Ty.2,{42) with 
V2 via 8, then ¥y2 becomes identified with the linear map Jpogeq—1(e(x)}. In 
particular, the map $42 ts @ continuous linear mapping from T,(M) to Ty2)(Af 2). 
If 9: M,- Mz and ¥: Mz; — Mg are two differentiable mappings, then it 
follows immediately from the definitions that 


(¥° gen = Ye gtx) o Pxa- (4.4) 


We have seen that the choice of chart (U, a) identifies T,(M) with V. 
Now suppose that M is actually V itself (or an open subset of V) regarded as a 
differentiable manifold. Then Af has a distinguished chart, namely (1, id). 
Thus on an open subset of V the identity chart gives us a distinguished way of 
identifying T (44) with V. It is sometimes convenient to picture T(J) as a 
copy of V whose origin has been translated to x, We would then draw a tangent 
vector at x as an arrow originating at x. (See Fig. 9.8.) 

Now suppose that © is a general manifold and that ¥ is a differentiable map 
of M into a vector space V;. Then ¥,(7.(M)) is a subspace of Ty2)(V1). 
If we regard ¥,(T.(M)) as a subspace of V; and consider the corresponding 
hyperplane through x, we get the “plane tangent to ¥(M) at x” in the intuitive 
sense (Fig. 9.9). 


376 DIFFERENTIABLE MANTFOLDS 9.4 


It is very convenient to think of tangent vectors in this way, that is, to 
regard them as vectors tangent to M if Jf were mapped into a vector space 

If fis a real-valued differentiable function defined in a neighborhood U il 
of z € M, then we can regard it as a map of the manifold U to the manifold R'. 
We therefore get a map fye: fe(M) > Tye)(R'). Recall that we identify 7,(R') 
with R! for any ye R!. Therefore, f,. can be viewed as a map from 7',,(A/} 
to R'. The reader should cheek that this map is indeed given by 


feel = Hf) for £e T,{AN). (4.5) 
In particular, if we take AfZz = Rand ¥Y = fin (4.4), we can assert: 


Let ¥ be a differentiable map of Af, to A¥g, and let f be a differentiable fune 
tion on Af defined in a ucighberhood of ¥(z). Then for any = € T,(47)), 


EY*(A)) — var(2(/). (4.03) 


l'rom now on, we shall frequently drop the subscript z in ¢,, when it can he 
understood from the context. Thus we would write (4.4) as (fo vy), = bye o yep. 
Some authors call the mapping ¢,, the differential of ¥ at x and designate it d.. 
Tf Af, and My are open subsets of Banach spaces V7; and V2 (and hence are 
differentiable manifolds under their identity charts), then ¥4, as defined above 
does reduce to the differential dy, when 7 ,.(4Af,) is identified with V; This 
reduction does depend on the identification, however. 


5. FLOWS AND VECTOR FIELDS 


Let AZ, and Af, be diffcrentiable manifolds. A map ¢ from AZ, — Afy is called a 
diffeomorphism if g is a differentiable one-to-one map of Jf, onto Af, such thal 
g | is also differentiable. 

Let AZ be a differentiable manifold. A map ¢: Af & R — Af is called a one- 
parameter group if 

1) ¢ is differentiable; 

iil) gfe, 0) = x forallz € A; 

iii) ¢(y(r, 8), 2) = o(z, s+ 1) forallz © Af and s,teE R. 

We can express conditions (ii) and (ili) a little differently. Let o;: AZ —> Al 


be given b 
mace = oGD. 


For each ¢t € R the map ¢ is differentiable. In fact, 
t= Eole 
where ¢; is the differentiable map of 47 — M x R given by e(z) = (2, 4). 
Then condition (il) says that yp = id. Condition (iii) says that 
¥t° Ps = Fi4+s- 


If we take ¢ = —s in this equation, we get g;° ¢_,= id. Thus for cach £ 
the map ¢, is a diffeomorphism and (¢,)~! = ¢_;. 


9.5 FLOWS AND VECTOR FIELDS 377 


ee 
een 
oa 


Fig. 9.10 


We now give some examples of one-parameter groups. 


Example 1. Let Af = V bea vector space, and let w € M. Let y: Vx R-V 
be given by 

e(v, ) = u+ tu, 
It is easy to check that (i), (ii), and (iii) are satisfied. (See Fig. 9.10.) 


Example 2. Let M = V be a finite-dimensional vector space, and let A be a 
linear transformation 4: ¥— V. Recall that the linear transformation e*4 is 
defined by 


et ate oa 
alt ia ia Ve sy 
ie, for any v € V, 
wm 37 ; 
e4y = L aty, 
j=oJ? 


Fig. 9.12 


378 DIFFERENTIABLE MANIFOLDS 9.5 


compact set of <v, £>, the map »: M x R - M given by 
vv, ) = e'4y 
is easily seen to be differentiable and to satisfy (ii) and (iii) as well. 


Example 3. Let Af be the circle S!, and let @ be any real number. Let y? be the 
diffeomorphism consisting of rotation through angle fa. In terms of the atlas 
@ = {(U1, 6), (Us, 02)}, the map ¢ is given by 


6; (g(x, 2) = 61(2) + ta, xEeUy, 61(z) < 27 — ta, 
= 6,(z) + ta — 2n, xzEU,, O1(x) > 2r — ia, 
62 (v(x, t)) = 62(a) + ta, &EU2, 02(z) < 2x + 7/2 — ta, 


== 62{a) + ta — 27, zEU,g, éo(2) > 2a + 7/2 — te. 


(Strictly speaking, this doesn’t quite define y for all values of <2,t>. If 
x= <1,0> and ta = 7/2, then x € U7) and ¢f(z, 7/2) € Uz. This is easily 
remedied by the introduction of a third chart.) It is easy to see that ¢ is a one- 
parameter group. 


Example 4, Let M = S! x S! be the torus, and let @ and b be real numbers. 
Write < © M asx = <2, 22>, where z; ¢ S', Define o<** by 


<a,b> b 
e "'"" (x3, 2,8) = <et(ri), ¢2(t2) >, 


where »* and ¢? are given in Example 3. Then ¢<* >” is a one-parameter group 


and indeed 2 rather instructive one. The reader should cheek to see that essen- 
tially different behavior arises according to whether b/a is rational or irrational. 
(The construction of Example 4 from Example 3 can be generalized as 
follows. If g: MxR—M and #:N X R—WN are one-parameter groups, 
then we can construct. a one-parameter group on MX N given by ¢: X yi. 
The map of Mx NX R—- MXN sending <z,7,t> > <e¢;{z), ¥:(y) > is 
differentiable because it can be written as the composite (y X ¥) o A, where 


ox: MxXRXNXROM XN, 
and 
A:MxXNXxXR-MXRXNXR 


is given by A(z, y, ) = <2, 6, y, t>.] 


In each of the four preceding examples we started out with an “infinitesimal 
generator” to construct the one-parameter group, namely, the vector w in 
Example 1, the linear transformation A in Example 2, the real number @ in 
Example 3, and the pair <a, b> in Example 4. We will now show that associated 
with any one-parameter group on a manifold, there is a nice object which we 
can regard as the infinitesimal generator of the one-parameter group. 

Let y¢: M x IR — M be a one-parameter group. For each z € M consider 
the map yz of R > M given by 


gx(t) = of, f). 


9.5 FLOWS AND VECTOR FIELDS 379 


In view of condition (ii), we know that ¢,(0} = x. Thus ¢; is a curve passing 
through z (see Fig. 9.13). Let us denote the tangent to this curve by X(z). 
We thus get a mapping X which assigns to each x € M a vector X(x) € T,(M). 
Any such map, 1e., any rule assigning to each x € M 2 vector in T,(M), will be 
called a vector field. We have seen that every one-parameter group gives rise to a 
vector field which we shall call the znfinttestmal generator of the one-parameter 
group. 


Fig. 9,13 


Let ¥ be a veetor field on M, and let (U, a} be a chart on M. For cach 
xeU we get a vector V(x), © V. We can regard this as defining a V-valued 
function Y, on a(U): 


Y,() = Y(@'()), for ve a(t). (5.1) 


Let (W, 8) bea second chart, and let Yg be the corresponding V-valued function 
on §(W). If we compare (5.1) with (4.3), we see that 


¥g(6 o a~"(v}) = Tpeamt(v) o Yoo) if ve a(U NW). (5.2) 


Equation (5.1) gives the “local expression” of 2 vector field with respect to a 
chart, and Eq. (5.2) describes the “transition law” from one chart to another. 

Conversely, let @ be an atlas of M, and let Y, bea V-valued function defined 
on a(f/) for each chart (U, «) € @. Suppose that the Y, satisfy (5.2). Then for 
each x & Mf we can let ¥ (x) € T,(M) be defined by setting 

Yee = Ya(a(z)) 

for some chart (U, a) about x. It follows from the transition law given by (4.3) 
and (5.2) that this definition does not depend on the choice of (U, a). 

Observe that Jg.9—-1 is a C*-function (linear transformation-valued function) 
ona(t] AW). Therefore, if Y is a vector field and Y, isa V-valued C*-function 


ona({U), the function Y will be C® on 8(U M V). In other words, it is consistent 
to require that the functions ¥, be of class C®. We shall therefore say that Y 


380 DIFFERENTIABLE MANIFOLDS 9.5 


is a C*-vector field if the function Y, is C” for every chart (U, a). As in the case 
of functions and mappings, in order to verify that Y is C®, it suffices to check that 
the Y, are C” for all charts (U, «) belonging to some atlas of MZ. 

Let us check that the infinitesimal generator X of a one-parameter group + 
is a C*-vector field. In fact, if (U, a} is a chart, then 


X, (2) = {a ° gr)’ (0), 
where ,(f) = ¢(z, 1). We ean write ao ¢,(f) = &,(v, ), where 
$,(v, 2) = ao ola} (v), t). 


Let U’ Cc U bea neighborhood of x such that ¢(y, f) € U for y € U’ and |t| < € 
Then , is a differentiable map of a(U/’) kX I a{V), where J = {€: |i] < €} 
In terms of this representation, we can write 


__ 8s 


Xa(v) at 


(v, 0). (5.3) 


This shows that X is a C*-vector field. 

If we evaluate (5.3) in the case of Example 1, we get Sjqfv, t) = v + te, 
so that Xiqg = w. In the ease of Example 2 we get Xia{v) = Av. 

There are various algebraic operations that can be performed with vector 
fields. The set of all vcetor fields on 14 forms a vector spacc in the obvious way. 
If X and Y are C*-vector fields, then so is aX + bY (a and 6 are constants), 


where 
(aX + bY)(z) = aX(x) + bY (2), zEeM., 


Similarly, we can multiply a vector field by a function. If fis a function and 
is a vector field, we define fX by 


(fX)(z) = fix)X(z), = te M. 


It is easy to see that if f and X are differentiable, then so is fX. It is also easy 
to check the various associative laws for this multiplication. 

We have seen that any one-parameter group defines a smooth vector field. 
Let us examine the converse. Does any C*-vector field define a one-parameter 
group? The answer to the question as stated is “no”. 

In fact, let X = 0/az' be the vector field corresponding to translation in the 
z-direction in R*®. Let M = R* — C, where C is some nonempty closed set. 
of R®. Then if pis any point of M that lies on a line parallel to that z!-axis which 
intersects C' (Fig. 9.14), then ¢,(p) will not be defined (will not lie in Af) for 
every t. 

The reader may object that Jf “has some 
points missing” and that is why X does not 
generate a one-parameter group. But we can 
construct a counterexample on R? itself. In > —Ce\ 
fact, if we consider the vector field X on R? 
given by 


Kale a= Oh -i27y), Fig. 9.14 


9.5 FLOWS AND VECTOR FIELDS 381 


then (5.3) shows that ¢, if defined, satisfies 


db db 
‘dt (2, i) = ‘a (h(x, d), 0) — X (®(z, t)), 
where = 4g. If we let y*(é,x) = x’ > &(z, t), then 
dy" = 1 cares | 
“di — 1, ¥ (0) =2Z, 
dy? _ _g, fy2 27, — 42 


If x? ¥ 0, then the unique solution of the second equation is given by 
ep I 


which is not defined for all values of é. Of course, the trouble is that we only 
have a local existence theorem for differential equations. 

We must therefore give up on the requirement that ¢ be defined on all of 
MxR, 


Definition 5.1. A flow on M is a map ¢ of an open set U CM x R - Af 
such that 

i} Mx {0} CU; 

li) ¢ is differentiable; 


iii) p(x, 0) = 2; 
iv) e{e(a, s), 4) = g(x, s+ whenever both sides of this equation are 
defined. 


For zx fixed, ¢.(f} = ¢(z, 2) is defined for sufficiently small é, so that ¢ gives 
rise to a vector field X as before. We shall call X the infinitesimal generator 
of the flow ¢. 

As the previous examples show, there may be no £ ¥ O such that ¢(z, £) is 
defined for all x, and there may be no x such that ¢(z, £) is defined for all ¢. 


Proposition 5.1. Let X be a smooth vector field on A¥. Then there exists a 
neighborhood U of Af x {0} in M x R and a flow y: U —> M having X as 
its infinitesimal generator. 
Proof. We shall first construct the curve ¢,(t) for any z € M, and shall then 
verify that <<, i> +> ¢(z, 0) is indeed a flow. 
Let x be a point of M, and let (U, a) be a chart about x. Then X, gives us 
an ordinary differential equation in e{{/), namely, 


a Re BEAT, 


By the fundamental existence theorem for ordinary differential equations, there 
exists an € > 0, an open set O containing a(x), and a map 


®.:0X {t: lt] < €} — a(V) 


382 DIFFERENTIABLE MANIFOLDS 9.5 


such that, 
#, is C*, ©, (ve, 0} = », 
and 


AGE) = X,(6a(v, t)). 


Here the choice of the open set O and of ¢ depends on a(x) and a(€). The 
uniqueness part of the theorem asserts that , is uniquely determined up to the 
domain of definition; i.e., if ©, is any curve defined for |é| < €’ with 4,(0) = » 
and 
d&,(£) 
at 


= X,(8,()), (5.4) 


then #,(é) = ,(», £). 
This implics that 
€, (2, t -- 8) — $4 (Fx(v, 8), t) 


whenever both sides are defined. (Just hold s fixed in the equation.) 
Consider the curve $*,(-) defined by 
6% (2) = a7 }(@,(a(x),t)). (5.5) 


It is defined for |¢| < ¢, and is a continuous, in fact differentiable map of this 
interval into M. Furthermore, if we write W = °,(-) then (5.4) asserts that the 
tangent vector to the curve k(t + -) is X(i(¢)), the value of the vector field at the 
point p(t). We will write this condition as 


wet) = X(h(t)). (5.6) 


Equation (5.6) is the way we would write the “‘first order differential equation” 
on M corresponding to the vector field X. A differentiable curve satisfying 
(5.6) is called an integral curve of X. We now can formulate a manifold version 
of the uniqueness theorem of differential equations: 


Lemma 5.1. Let :] — M and io: — M be two integral curves of X defined on 
the same interval f. If &(s) = o(s) at some point s € f then Wy = te, ie. b(t} = 
Wott) for all t € TF. 


Proof. We wish to show that the set where f(t) * do(t) is empty. Let 
A = {tt > s and dh(é) = woft)}. 


We wish to show that A is empty, and similarly that the set B = {t:t < s and 
y(t) = polt)} is empty. Suppose that A is not empty, and let 


t, = glbA = glib {t:t = 8 and y(t) = Wolt)}. 
We will derive a contradiction by 


i) using the uniqueness theorem for differential equations to show that y(t) 
Holt +), and 


ii) using the Hausdorff property of manifolds to show that Wy(¢+) = Welt+). 


9.6 LIE DERIVATIVES 383 


Details: i). Suppose that W(t.) = Welt.) = » E M. We can find a coordinate 
chart (8, W) about y, and then B « is) and B © tho are solutions of the same system 
of first order ordinary differential equations, and they both take on the value 
Bly) at f = ¢,. Hence, by uniqueness for differential equations, B ° ds, and B° fo 
must be equal in some interval about ¢ +, and hence #y(¢) = whe(t) for all tin this 
interval. This is impossible since there must be points arbitrarily close to t , 
where (4) = o(t) by the glib property of ¢ , . This proves i). Now suppose that 
y(t.) = ot ,}. We can find neighborhoods U/; of s,(¢.) and Ue of dolt +) such 
that U; N Ue = &. But then the continuity of the ys, imply that W(¢) € U; and 
helt) © Us for t close enough to¢,, and hence that (1) y(t) for tf in some 
interval about t,. This once again contradicts the glb property of t,, proving 
ii). The same argument with glb rep\aced by lub shows that B is empty proving 
Lemma 5.1. The above argument is typical of a ‘“‘connectedness argument.”’ 
We showed that the set where ,(t) = o(t) is both open and closed, and hence 
must be the whole interval /. 

Lemma 5.1 shows that (5.5) defines a solution curve of X passing through 
x at time & = 0, and is independent of the choice of chart in any common 
interval of definition about 0. In other words it is legitimate to define the curve 


,(+) by 
b(t) = a (, (a ),2)) 


which defines $,(¢) for |¢| < «. Unfortunately the « depends not only on x but 
also on the choice of chart. We use Lemma 5.1 and extend the definition of ,(-) 
as far as possible, much as we did for ordinary differential equations on a vector 
space in Chapter 6: For any s with |s| < « we let y = $,(s) and obtain a curve 
yC) defined for |t] < ¢’. By Lemma 5.1 


o,{2) = o,(s +2) if [sté| < €. (5.7) 


It may happen that |s| + ¢’ > ¢. Then there will exist a¢ with |¢t| < e’ and 
|s+t| > e. Then the right hand side of (5.7) is not defined, but the left is, We 
then take (5.7) as the definition of $,(s + 4), extending the domain of definition 
of o,C). We then continue: Let £,,.+ denote the set of alls > 0 for which there 
exists a finite sequence of real numbers sg = 0 < 8) < ... < sz = s and points 
xo, ..-X,-1 © M with x9 = x, s, in the domain of definition of 6,(-), x2 = o,(s)) 
and, inductively, 


Si+1 in the domain of definition of #,;(-) and 2:41 = 0,;(8; 41). 

If se Ii, so is s’ for 0 < s’ < s, and so is s+ for sufficiently small 
positive 7. Thus J is an interval, half open on the right. By repeated use of 
(5.4) we define ¢2(s) for se I$. We construct Z> in a similar fashion and 
set J, = IZ UIT. Then ¢-(s) is defined for s€ Z,, and Z is the maximal 
interval for which our construction defines y,. For cach 2 € Af we obtain an 
open interval J, in which the curve ¢-(-) is defined. 

Let U = Usen fz} x I. Then U is an open subset of Af x 7. To verify this, 
jet (Z, 3) € U. We must show that there is a neighborhood W of Zand ane > 0 
such that s € J, for all |s — 3| < eand xe W. By definition, there is a finite 


384 DIFFERENTIABLE MANIFOLDS 9.6 


sequence of points 7 = Xp, Z,..., %, and charts (Uj, a,),..., (Ug, ax) with 
x; € U; and x; € U; and such that 
ai(E;) = bq, (os(F:-1), 4), 


where ¢; +-----+¢ = s. It is now clear from the continuity properties of the 
, that if we choose xo such that a,(zo) is close enough to a(%o), then the 
points x; defined inductively by 


a(t) == By, (a,(x;-1), 3) 


will be well defined. [That is, a;(x;_1} will be in the domain of the definition of 
#,.(+, t;).] This means that 5 € 7,, for all such points x. The same argument 
shows that 3+ 7 <7, for » sufficiently small and z sufficiently close to %. 
This shows that U is open. 

Now define ¢ by setting 


g(x, ) = g(t) for G@oeEU. 
Thal ¢ is differentiable near MZ X {0} follows from the fact that ¢ is given (in 
terms of a chart) as the solution of an ordinary differential equation. The 
fundamental existence theorem then guarantees the differentiability. Near the 
point (%, 2) we can write 
e(a,t) = ole: + (e@, i), fa) +++ ie), $= tite +h, 


and so ¢ is differentiable because it is the composite of differentiable maps. J 


6. LIE DERIVATIVES 


Let ¢ be a one-parameter group on a manifold M, and let f be a differentiable 
function on AZ. Then for each ¢ the function ¢?[f] is differentiable, and for ¢ ~ 0 
we can form the function 


We claim that the limit of this expression as ¢ — 0 exists. In fact, for any 
2 EM, ot {flz) = fo o(z) = f° g(t) and, therefore, 


lim ve ul} — Lo) = lim f 9 gz(t) = f ° ¢z(0) = Dy. f = X(a)f. (6.2) 


Here X(x) is a tangent vector at x and we are using the notation introdueed in 
Section 4. We shall call the limit of (6.1) the derivative of f with respect to the 
one-parameter group ¢, and shall denote it by Dxf. More generally, for any 
smooth vector field X and differentiable function f we define Dyf by 


Drxf(z) = X(z)f — forall xe M, (6.3) 


and call it the Lae derivative of f with respect to X. In terms of the flow 
generated by X, we can, near any x € M, represent Dxf as the limit of (6.1), 


9.6 LIE DERIVATIVES 385 


where, in general, (6.1) will only be defined for a sufficiently small neighborhood 
of x and for sufficiently small |é]. 

Our notation generalizes the notation in Chapter 3 for directional derivative. 
In fact, if M is an open subset of V and X is the “constant vector field” of 
Example 1 

Xiq= we, 
then 
(Dxf)ia = Dwfia, 

where Dy is the directional derivative with respect to w. 

Note that Dyf is linear in X; that is, 


Daxsorf = aDxf + bDyg 


if X and ¥ are vector fields and a and 6 are constants. 
Let ¥ be a diffeomorphism of 44, onto M2, and let X bea vector field on My. 
We define the “pullback” vector field ¥*[X] on 14, by setting 


(X]@) = ye X(y(z)) ~~ forall xe M. (6.4) 


Note that ¥ must be a diffeomorphism for (6.4) to make sense, since ¥—! enters 
into the definition. This is in contrast to the “pullback” for functions, which 
made sense for any differentiable map. Equation (6.4) does indeed define a 
vector field, since 


e) = Wig Tyae(M2) > TMi) and = X42) € Tym (M2). 


Let us check that ¥*[X] is a smooth vector field if X is. To this effect, let @1 
and @» be compatible atlases on J4, and Mo, and let (U, a) € @; and (W, 8) E Gz 
be compatible charts. Then (6.4) says that 


V*[X]a(v) = Jaoy-lep-1(Beyea'(v))-Xg(@eyoa'(v)) for vea(U), 
which is a differentiable function of ». Since, by the chain rule, 
Tqoy-lop-1 (8 0 Wo a! (v)) + Fgogoa 10) = 1, 
we can rewrite the last expression more simply as 
¥[Xla(0) = Fpepra-tv)) 'Xp(Boyoau*)) for veal). (6.5) 


Thus ¥*[X], is the product of a smooth Hom(V.2, V)-valued function and a 
smooth V>2-valued function, which shows that ¥*[X] is a smooth vector field. 


Exercise. Let y be the flow generated by X on \fz. Show that the flow generated by 
¥*(X] is given by 
Kx, t> > pye(Y(2), 4). (6.6) 


If y is a one-parameter group, then we can write (6.6) as 


<2,t> oy logo ¥(z). (6.6) 


386 DIFFERENTIABLE MANIFOLDS 9.6 


The vector field X The veetor field ¥ 
(Xf = afar) (Ff =.riaf/ay)) 


Alu, v= <1, 0> -6, V(u, ve) = <0, u> 


x. ‘ * 
PEAT! 2 Opyy 


fsinee independent of 0 


Fig. 9.15 


It is easy to check that if yy: 4, — Mz and 2: 2 — Mz are diffeo- 
morphisms and FY is a vector field on Jf3, then 


(Yoo wi )*Y = piyey. 


9.6 LIE DERIVATIVES 387 


(X)-X _ DyX 
—E If f is a differentiable function on Me, then 
{since independent of ¢) 
Dyn") = ¥*(Dxf). (6.7) 
In fact, by (6.3) and (4.6) we have, for z & M,, 
Dyixw{fl@) = ¥1Xa)y" Tf] by (6.3) 
= 9, 'XW(e))¥*1f] by (6.4) 
= (ad, 'X(Yi@)))f by (4.6) 
= X(¥(2))f 
Fig. 9.15 (eont.) = ¥*(Dxf) (x). 


Let ¢ be a one-parameter group on Mf with infinitesimal generator X, and 
let Y be another smooth vector field on A¥. For ¢ + 0 we can form the vector 
field ‘ 

¥t vi Y (6.8) 
and investigate its limit as {> 0, which we shall call Dy ¥Y. In Fig. 9.15 we 
have shown the calculation of DyX and DyY for two very simple fields on the 
Cartesian plane R®. The field X is the constant field X;q = 4,, so that Xf = 
of/ax in terms of Cartesian coordinates zx, y. The corresponding flow is given 
by g(x,y} = <xo+it,y>. Thus ¢., = id if we identify the tangent space at 
each point of the plane with the plane itself. Then ¥ + gf ¥ consists of “moving” 
the vector field Y to the left by ¢ units. Here we have taken ¥ = réo, so that 
Yf = x(df/dy). In Fig. 9.15(c) we have pictured of Y, and have superimposed Y 
and of Y in Fig. 9.15(d). Figure 9.15(e) represents of ¥ — Y and Fig. 9.£5(f) 
is (1/t){ef ¥ — Y¥}, which coincides with its limit, Dy Y, since the expression is 
independent of ¢. The one-parameter group gencrated by Y is y, where ¥,(z, y} = 
<x,y-+iz>. Here at any p © R? we have ¥45) = 5; + £80, so that y*X = 
v1 X(v(x)) = 8) — £89. In Fig. 9.15(2) we have drawn ¥7X and in Fig. 9.15(h) 
we have superimposed it on X. Note that Dy ¥ = —DyX. However, these 
two derivatives are nonzero for quite different reasons. The field ¢/Y varies 
with 2 because the field Y is not constant. The field ¥*X varies with t because 
of “distortion” in the flow y, See Fig. 9.15(¢} and (h). In the general case, 
Dy ¥ will result from a superposition of these two effects. We now make the 
general calculation. 

Let (U, «) bea chart on M, and fory € a(U) let O bea sufficiently small open 
set containing », and let € > 0 be sufficientiy smal}, so that @, given by 


b,(w, t) = ae gla” *(w), #) 
is defined for w € O and |?] < e. Then, for |é] < ¢, Eq. (6.5) implies that 
LY atv) = Fog.) 7? Va {Pa(r, t)). (6.9) 


388 DIFFERENTIABLE MANIFOLDS 9.6 


The right-hand side of this equation is of the form A; 'z,, where A; and 2, are 
differentiable functions of ¢ with Ag = J. Thercfore, its derivative with respect 
to ¢ exists and 


d( Ar 721) = lim Azz, — 40 
di t=O t-0 

= lim A, (Ariz, = Zo) 
i—0 t 

oe rs es 
t—0 t 

= lim Ze — %o a3 Azo — £0 
é0 i E 


Now in (6.9) z = Y.(@2(2, t)), so 


# = a¥, (Se v, 0) 
= dV, (Xo(0)). 


Here Y, is a V-valued function, so d¥, is its differential at the point &,(v, 0}. 
Hence dY,(X,,(v)) is the value of this differential at X,(v). The transformation 
Ay = Jog) = A(Pa)iv,49) 90 


dA _ o db, 
At lreo em 
Ob, 
mite 
= AX a)». 


Thus the derivative of (6.9) at t = 0 can be written as 
AY a) o(Xalv)) a d(Xa)o(Ya(v)) = Dx nla = Dy woXe- 


We have thus shown that the limit in (6.8) exists. If we denote it by DxY, 
we can write 
(DxY)alt) = Drww Ya — Draka. (6.10) 


As before, we can use (6.10) as the definition of Dy Y for arbitrary vector 
fields X and Y. Again, this represents the derivative of Y with respect to the 
flow generated by X, that is, the limit of (6.9) where now (€.8) is only locally 
defined. 

From (6.10) we derive the surprising result that Dy ¥Y = —DyX. For this 
reason it is convenient to introduce a notation which expresses the antisym- 
metry more clearly, and we shall write 


Dy Sx). 


9.6 LIE DERIVATIVES 389 


The expression on the right-hand side is called the Lie bracket of X and Y. 
We have 
[X, Y]) = —[Y, X]. (6.11) 


Let us evaluate the Lie bracket for some of the examples listed in the 
beginning of Section 5. Let @ = R*. 


Example l. Jf Xjq = w, and Vig = we are “constant” vector fields, then (6.10) 
shows that [X, ¥] = 0. 


Example 2. Let Xja(v} = Av, where A is a linear transformation, and let 
Yia = w. Then (6.10) says that 


[X, Yha(e) = —Aw, 
since the directional derivative of the linear function Av with respect to wis Aw. 


Example 3. Let Xja(v) = Av and Yig(v) = Bu, where A and B are linear 
transformations. Then by (6.10), 


[X, Vha) = BAv — ABv = (BA — AB). (6.12) 


Thus in this case [X, ¥] again comes from a linear transformation, namely, 
BA — AB. In this case the antisymmetry in A and B is quite apparent. 


We now return to the general case. Let ¢ be a one-parameter group on M, 
let ¥ be a smooth vector field on AM, and let f be a differentiable function on Af. 
According to (6.7), 

Deter Lf) = @2 (yf). 
Then 
vi(Dyf} ~ Dyf _ Derivilgi [f}) — Dr vilf) 4 Dy(vif) — Dyf 
t é é 
* vif —f 
= D relive et (f] — Dy (22-4) : 
t 


Since the functions ¢f(f] are uniformly differentiable, we may take the limit as 
i— 0 to obtain 


Dx(Dyf) = Doyyf + Dy(Dxf) 
= Dx,vif + Dy(Dxf). 


Dix vif = Dx(Dyf) — Dy(Dxf). (6.13) 
In view of its definition as a derivative, it is clear that Dy Y is linear in Y: 
Dx (aY, + bY¥2) = eDxY¥, + bDx¥>2 


if a and b are constants and X and F¥ are vector fields. By the antisymmetry, 
it must therefore also be linear in X. That is, 


Dox, 46x,¥ = [aX + 6X2, Y] = a[X1, Y]) + b[Xo, ¥Y] = aDy,Y + bDyx,F. 


In other words, 


390 DIFFERENTIABLE MANIFOLDS 9.7 


Let XY and ¥ be vector fields on a manifold Mg, and let y bea diffeomorphism 
of M4, onto Mfg. Then 
w*[X, Y) = [y*X, ¥*¥]. (6.14) 


In fact, suppose X generates the flow ». Then 
* * ky: eY-—Y 
¥ [X, ¥] = yy Dxy¥ = y lim av — 1) 
t=0 
* % * 
1=0 t 
vote v"¥ —v"Y 
é 


= lim 
i=0 

= lim (yo OP EO vy" = vy A 
t=O £ 


Since ¥—! © ¢, © ¥ is the flow generated by ¢*X, we conclude that the last limit. 
is Dytxy*Y, which proves (6.14). 

Now let Y and Z be smooth vector fields on AZ, and let X be the infinitesimal 
generator of ¢. Then 


t=0 t 


> [Dx Y, yA + ix, DxZ). 


Thus 
[X, (Y, Z]] — [[X, ¥). 2] + LY, (X, Z]].- (6.15) 
In view of the antisymmetry of the Lie bracket, Eq. (6.15) can be rewritten as 
[xX, (Y, ZI] + [¥, (2, X1] + [2, [X, ¥]] = 0. (6.16) 


Equation (6.15), or (6.16), is known as Jacobi’s identity. 


7. LINEAR DIFFERENTIAL FORMS 


Let M be a differentiable manifold. We have attached to each x € M a vector space 
T,(M). Its dual space, (T,(M))*, is called the cotangent space to M at x, and will be 
denoted by 7;(M}, Thus an element of T?(M) is a continuous linear function on T;(M); 
it is called a covector. 

Some explanation of the word “continuous” is in order. In the ease where 
M [and hence 7,(Af)] is finite-dimensional, all linear functions on T,(A£) are 
continuous, so no further comment is necessary. We shall be concerned primarily 
with this case. More generally, let ? be a linear function on T,(M). For any 


9.7 LINEAR DIFFERENTIAL FORMS 391 


chart (U', a) about x we have identified 7,(4f) with Y, thus identifying ¢& 7, (Af) 
with & € V. Then / determines a linear function i, on V by 


(bas be) == CE, 2). (7.1) 
If (W, 8) is a second chart, then 
(£8, lg) = WJaop-1(B(x)) Ep, La). 


Since J.g-1(8(x)) is a continuous map of V into V, we see that é, is continuous 
if and only if Js is. We shall therefore say that / is continuous if £, is continuous 
for some (and hence any) a. In this case we see that (7.1) gives us an identifica- 
tion of 7¥(Af) with V* sending Z into iz. The last equation says that the rule 
for change of charts is given by 


le = (Jaca? (B(z))) "a (7.2) 


Let f be a differentiable function on M, and let zx = M. Then the function 
on T,(M) sending each £ € T,(M) into &(f} will be denoted by df(x). Thus 


{é, df(x)) = ¢f. 
It is easy to see thal ef € TF(AY). In fact, in terms of a chart (U’, a) about z, 
(é, Y(z)) = Del fa) (a(z)). 


Note that f assigns an element f(z) of 77 (J) to each « € AZ. A map which 
assigns to each x & Af an element of 7’*(A/) will be called a covector field or a 
linear differential form. The linear differential form determined by the function f 
will be denoted simply by df. 

Let w be a linear differential form. Thus w(x) € 7'*(AL) for each 2 € MM. 
Let @ bean atlas of AY. For each (L/, a) € Af we obtaim the V *-valued function 
@, on a(l’) defined by 


w,(0) = {olan}, for veal). (7.3) 
If (W, 8) € @, then (7.2) says that 


wg(8 oa "(e)) = (Saog-1(8 © a "(u))) *wa(v) 
= (Spoa-1(v)) ~! *wa(v) for vea(UnwW). 7.4) 


As before, Eq. (7.4) shows that it makes sense to require that w be smooth. 
We say that w is a C*-differential form if w, is a V*-valued C*-funetion for every 
ehart (U', a). By (7.4) it suffices to check this for all charts in an atlas. Also, if 
we are given V*-valued functions w,, each defined on afl/), (L’,a) € @, and 
satisfying (7.4), then they define a linear differential form w on Af via (7.3). 

If w is a differential form and f is a function, we define the form fw by 
fo(x) = f(x)w(z). Similarly, we define a, + we by 


(wy + we)(z) = wy (x) + w(x). 


392 DIFFERENTIABLE MANIFOLDS 9.7 


Let Af; and M, be differentiable manifolds, and let y: M1 — Ma bea 
differentiable map. For any # € Af, we have the map g.2: Tz(1f£1) = Pt2)(Mo). 
It therefore defines a dual map 


(vax)*: Toce(M2) > Ty (M)). 


(The reader can check that if! © T3.2)(M2), then & — (¢4(£), 2) is a continuous 
linear function of £, by verification in terms of a chart.) 

Now let w be a differential form on M3. It assigns w(e(x)) € T%.2)(M2) to 
g(x), and thus assigns an element (y4x)*(w(¢(2))) € T#(M,) toz Ee My. We 
thus “pull back” the form w to obtain a form on JA¥, which we shall call o*o. 
Thus 

g*a(z) = (¢x2)*a(e(2)). (7.5) 


Note that ¢* is defined for any differentiable map as in the case of func- 
tions, not only for diffeomorphisms (and in contrast to the situation for vector 
fields). 

It is easy to give the expression for y* in terms of compatible charts (U, a) 
of M, and (W, 8) of 4f2. In fact, from the local expression for y* we see that 


(e*a)a(v) = (Sgogea-1(v)) *wa (Bo goa '(v)), vea(U). (7.6) 


From (7.6) we see that ¢*w is smooth if w is. It is clear that ~* preserves algebraic 
operations: 
p*(w1 + we) = gto + y* we (7.7) 
and 
e*(fo) = *[flp*(w). (7.8) 


If vy: Af, — Me. and ¥: M.— Mz are differentiable maps, then (4.4) an:l 
(7.5) show that 
(yo g)*w = o*y*w. (7.9) 


Let ¥: Jf, — A72 bea differentiable map, and let f be a differentiable func- 
tion on Af,, Then (4.6) and the definition df show that 


d(y*({) = ¥* df. (7.10) 


Let ¢ be a flow on Jf with infinitesimal generator X, and let @ be a smooth 
linear differential form on Af. Then the form gfw is locally defined and, as in. the 
case of functions and vector fields, the limil as é — 0 of 


* 
18 — & 
t 


exists. We can verify this by using (7.6) and proceeding as we did in the casr: 
of veetor fields. The limit will bea smooth coveetor field which we shall call 
Dxw. We could give an expression for Dyw in terms of a chart, Just as we 
did for veetor fields. 


9.8 COMPUTATIONS WITH COORDINATES 393 
If f is a differentiable function, w a smooth differential form, and X the 
infinitesimal generator of y, then 


Dx(fw) = (Dxf)w + fDxw. (7.11) 
In fact, 


* 
Dx(fo) = lim eifo — Ie 


= lim (22 SF eye peie—s) 


tO t 
= (Dxf)w + fDxw. 
If ¢ is a differentiable function on Af, then 


ee dg — dg _ dvélg)— de _ (2ild = s) 
t 7 t 7 t 


An easy verification in terms of a chart shows that the limit of this last expression 
exists and is indeed d{Dxyy). Thus 


Dx (df) = d(Dxf}. (7.12) 
Equations (7.11) and (7.12) show that if 


w= fy dg, +-:>+ fr dge, 
then 
Drow = (Dxfi) dgi +--+ + (Pxfe) don +h (Drei) + +++ + fe d(Dxgx). 
(7.12') 


Let «w be a smooth linear differential form, and let X¥ be a smooth vector field. 
Then (X, w) is a smooth function given by 


(X, w)(z) = (X(x), w(z)). 


Note that (X, w) is linear in both X and w, Also observe that for any smooth 


function f we have 
(X, df) = Dxf. (7.13) 


8 COMPUTATIONS WITH COORDINATES 


For the remainder of this chapter we shall assume that our manifolds are finite- 
dimensional. Let M be a differentiable manifold whose V = R*. If (U, a) isa 
chart of Af, then we define the function z, on U by setting 


ai(z) = ith coordinate of a(z). (8.1) 
If f is any differentiable function on U, then we can write Eq. (2.1) as 


f(z) = fa(xa(z), seg r3(x)), 


394 DIFFERENTIABLE MANIFOLDS 9.8 


which we shall write as 


f = i0ns aay eg Xa) (8.2) 

We define the vector field /a2*, on U by 
= {= <0,. .,0>). 8.3 
(3). ) it th mane ) ( ) 


If X is any veetor field on U, then we have 


x= Reel a eee (8.4) 


25 rn 
where the functions X* are defined by 
(X)a(a(x)) = «~«Xh(x),..., X2(x)>. (8.5) 


Equation (8.4) allows us to regard the vector field X asa “differential operator”. 
In faet, it follows from the definitions that 


pi Wes gale | 
Dxf = Xe dat +X fe (8.6) 
Since x, is a differentiable funetion on U, dz’, is a differential form on U and 


(dxi)a(e) = 8; forall ve U. (8.7) 


Ca ‘ ar) = 3%. (8.8) 


If w is a differentia] form on U, then 


In particular, 


@ = Arq tg t+-+ + Gna dre, (8.9) 
where the functions a,, are defined by 
Wa lolx) = <aig(z),..., Qng(z)>E R™*. (8.10) 


It then follows from the definitions that 


fe fe ayn 
df= ani dzb ese ep ans dx. (8.14) 
Equation (8.11) has built into it the transition law for differential forms under : 
change of charts. In fact, if (W, 8) is a second chart, then on Um W we have, 


by (8.11), 


din. (8.12) 


9.8 COMPUTATIONS WITH COORDINATES 395 


If we write w = aig dzh + +++ + Gng dxg and substitute (8.12), we get 
Now 


is the matrix J3.q-1. If we compare with (8.10), we see that we have recovered 
(7.4). 

Sinee the subscripts a, 8, cte., elutter up the formulas, we shall frequently 
use the following notational conventions: Instead of writing 2, we shall write x", 
and instcad of writing zg we shall write y’. Thus 


a = xa, y? a xB, = ah, ete. 


Similarly, we shall write X* for Xt, ¥* for X%, a; for aio, b; for aig, and so on. 
Then Eqs. (8.1) through (8.12) can be written as 


x(x) = ith coordinate of a(x), (8.1) 
f= iO s35); (8.2) 
te), (x) = (8.3) 

xe xt i+: og xe e (8.4 
axe” =) 

(X)a(e(z}) = pee ony X™(2)>, 8.5’) 
=, YE fea he se sg afa : 

Def = Eek 8.6’) 
(dx*)a(v) = 6, (8.7) 

es dr = = 6, (8.8’) 

w= a,dzi+---+ a, dr", (8.9") 
Wa{o{x)) = <ay(2),...,an(Z)>, (8.10) 
a ofa pl Fant! Ofes f ' 
df= 354 | + 5g dz’, (8.11') 
yf me OF at) wee) OY dae 
dy’ = ay dx -| | aah dx”. (8.12") 


The formulas for “pullback” also take a simple form. Let ~: Af, — Jf, bea 
differentiable map, and suppose that 34, is m-limensional and 14, is n4limen- 


396 DIFFERENTIABLE MANIFOLDS 9.5 


sional. Let (U, «) and (W, 8) be compatible charts. Then the map 


Beopoa t:a(U) > AW) 
is given by 
y'@@)) = y@",....2"), t= 1,...,2, (8.13) 
that is, by » functions of m real variables. If f is a function on Af, with 


f= faty', tay y”) on W, 
Vif) = fa(z’,.-.,2) on JU, 


dole ces YH Fo a i VO ge): (8.14) 
The rule for “pulling back” a differential form is also very easy. In fact, if 
w= a,dy'+--++a,dy” on W, 


then 


where 


then ¥*w has the same form on U, where we now regard the a’s and y’s as 
functions of the x’s and expand by using (8.12). Thus 


where a; = a,(y!(x',...,2™),..., y(z', ...,.2™)). 
Let z € U. Then 
7 
px (2) Qo agi aa (¥{x)) (8.15} 


gives the formula for yy. 


EXERCISES 


8.1 Letx andy be rectangular coordinates on E?, and let (r, @) be polar “coordinates” 
on E* — {0}, Express the vector fields ¢/dr and 4/00 in terms of rectangular coordi- 
nates. Express 0/dx and @/dy in terms of polar coordinates. 

8.2 Let 2, y, z be rectangular coordinates on E?, Let 


é a a a é 
~ Ua 7 ay’ Yezs ta) and ial rae 
Compute [X, Y¥], (X, Z], and [Y, Z]. Note that X represents the infinitesimal generator 
of the one-parameter group of rotations about the z-axis. We sometimes call X the 
“infinitesimal rotation about the z-axis”. We can do the same for Y and Z. 


8.3 Let 


Xx 


o 
-v5 2 Fae" 


Compute [4, 8], a and [B,C]. Show that Af = Bf = Cf = O if f(z, y, 2 = 
x®+ y? — 22. Sketch the integral curves of cach of the vector ficlds A, B, and €. 


Q é 
+25 Bueretis and C= 


9.9 RIEMANN METRICS 397 


8.4 Let 


7) ra) o ra) ta) e 
D=ausres E= ts — ea? and as ey 


Compute [D, E], [D, Fj, and [Z, F'). 
8.5 Let Pi,..., Pa be polynomials in 2!,...,2* with no constant term, that is, 


Pi@G,..,,0 = 0 


Let 
Sie ns ae 
= Bait oe Ox" 
and 
a a 
glean ia 
Show that 
(i, X] = 0 


if and only if the P,’s are linear. [Hint: Consider the expansion of the P;’s into homoge- 
neous terms. ] 


8.6 Let X and the P,’s be as in Exercise 8.5, and suppose that the P,’s are lincar. Let 


1 @a 2 @O 
A = hit ggi tt Awe an’ 


and suppose that 
Ai AAS for tj. 


Show that (4, X] = 0 if and only if P; = yx", that is, 


n 


10 , x O 1 
X = pie ——4--- nt >— forsome w,...,H. 
HL axl + pnd ax* u # 


8.7 Let A be as in Exercise 8.6, and suppose, in addition, that 
Ar HAs tar = forany 7,j,r. 
Show that if the P,’s are at most quadratic, then 
[4, X] = 90 


if and only if P; = yx’. Generalize this result to the case where P; can be a polynomial 
of degree at most m. 


9. RIEMANN METRICS 


Let 14 be a finite-dimensional differentiable manifold. A Riemann metric, m, 
on M is a rule which assigns a positive definite scalar product ( , }m,z to the 
vector space 7',(M) for each x € M. We shall usually drop the subscripts m and 
x when they are understood from the context. Thus if m is a Riemann metric 
on M,2€M, and &, » € T,(M), we shall write the scalar product of £ and » as 


(é, n) = (é, 2)m,z- 


398 DIFFERENTIABLE MANIFOLDS 


Let (U, «} be a chart of AY. Define the functions g;; on U by setting 


a 
ute) = (La, 2), 
so that ge; = gy. Uf & 9 € TCA) with 


i @ 
=D Aw) and 9= 309 api): 


then 
(9) = Lost, 
Since dx!(x}, 2... , dex) is the basis of FC) dual io the basis 
fe] a) 
aul (2), Gia ayn (2), 
we have 


= (8 dri), at = (m de'(e)), 
so that the last equation can be written as 
CE, ams — DX gexlz)(8, dx*Xy, dr’). 


Equation (9.2) is usually written more suecinetly as 
m [= ¥ 2x) de! de’. 


{Here (9.8) is to be interpreted as a short way of writing (9.2).] 
Let (WW, 8) be a second chart with 


0 é r 
hgr(e) (2 (<), ayt «)) , veW, 


mf W= 3¥ hae dy" dy’. 
Then for z ¢ Un W, we have 


that is, 


Zar @ se @ g7@)= rw) 


$0 


ote) = (2,32; @) = > () 2 (oyhante, 


that is, 


= ay” ay! 
Iii = p> hist a3 Ou 


9.9 


(9.1) 


(9.4) 


(9.5) 


Note that (9.5) is the answer we would get if we formally substituted (8.12) for 


the dy’s in (9.4) and collected the coefficients of dz* dr’. 


In any event, it is clear from (9.5) that if the #,; are all smooth functions 
on W, then the g,; are smooth on UM WW. tn view of this we shall say that a 


9.9 RIEMANN METRICS 399 


Riemann metric is smooth if the funetions g;; given by (9.3) arc smooth for any 
chart (U/, «) bclonging to an atlas @ of Af. Also, if we are given functions 
Gi; = gj Acfined for cach (U, a) € @ such that 

i) Daj (z)e' > 0 unless & = O forallz cU, 

ii) the transition law (9.5) holds, 
then the g;; define a Riemann metric on Af. In the following discussion we shall 
assume that our Riemann metrics are smooth. 

Let y: Mf, — My, be a differentiable map, and let m be a Riemann metric 
on Mo. For any « & M, define ( , )ytim,2 on T,(AZ,) by 


CE, aytem,2 = Wald), a2) ) m.genr (9.6) 


Note that this defines a symmetric bilinear function of £ and y. It is not 
necessarily positive definite, however, since it is conceivable that ¥,(£) = 0 
with ¢ = 0. Thus, in general, (9.6) does not define a Riemann metric on JZy. 
For certain ¢ it does. 

A differentiable map y: AY, —> AZ2 is called an immersion if Y,, is an injection 
(i.e., is one-to-one) for all 2 € 44). 

If ¥: AL, — AY, ig an Immersion and m is a Riemann metric on Af, then we 
define the Riemann metric ¥*(m) on AZ, by (9.6). 

Let (U, a) and (IV, 8) be compatible charts of AZ, and J%>2, and let 

m W= 2 heat dy* dy’. 
Then 
¥*(m) TU = ¥ gi; de* de’, 


where 


a d 
Gi3(2) = (4 @), a5 2 


a ri] 
= (80) 6) oo 


-(r¥ 2 ee 
= (x zat ak WO) DL oa ay win), 


which is just (9.5) again (with a different interpretation). Or, more succinctly, 
em) TU = De he y (ay). 


Let us give some examples of these formulas. If A€ = R”, then the identity 
chart induces a Riemann metric on R” given by 


(dr)? +++ (dx*)?, 


Let. us see what this looks like in terms of polar coordinates in R? and R*. 


400 DIFFERENTIABLE MANIFOLDS 9.9 


In R? if we write 


ti =rceos6, 27 =rsiné, 
then 
dx! = cos 6 dr — rsin @ dé, 
dz? = sin 6 dr + 7 cos @ dé, 
80 


(dz')? + (de)? = dr? + 1? do? (9.7) 
Note that (9.7) holds whenever the forms dr and dé are defined, i.e., on all of 
R? — {0}. (Even though the function 6 is not well defined on all of R? — {0}, 
the form dé is. In fact, we can write 
x! dx? — 2° dx’ 


d= “C= Ge 


‘) (9.8) 


In R? we introduce 


x'=reosgsin@, 
x’? = rsin gsin @, 
a? = rcos @. 
Then 
dr’ = cos ysin 6dr — rsingsin @dg +r cos ¢ cos 8 dé, 
dx? = sin g sin ¢dr + r cos ysin 6 dy +r sing cos 0 dé, 
dz? = cos 6dr — rsin 6 dé. 
Thus 
(dc!)? + (dr?)? + (dx)? = dr? + 1? sin? 6 dy? + 7? dé”, (9.9) 


Again, (9.9) is valid wherever the forms on the right are defined, which this time 
means when (x1)? + (7)? = 0. 

Let us consider the map ¢ of the unit sphere S? > R*, which consists of 
regarding a point of S? as a point of R*, We then get an induced Riemann metric 
on S?, 

Let us set 

dé = 1* de and dp = t* de, 


so the forms 46 and dg are defined on U = S? — {<0,0,1>, <0,0,—1>}. 
Then on U we can write (since r = 1 on S*) 

t*((dr!)? + (dr®)? — (de*)?) = dé? + sin? ¢ de. (9.10) 

We now return to general considerations. Let M be a differentiable manifold 


and let C: 1 + M be a differentiable map, where J is an interval in R', Let ¢ 
denote the coordinate of the identity chart on J. We shall set 


Cs) = Ce (2) (), eel, 


so that C’(s) € Tes)(M4) is the tangent vector to the curve C at s. If (U, a) is 
a chart on M and x!,...,2* are the coordinate functions of (U, a), then if 


9.9 RIEMANN METRICS 401 


CW") CU for some f’ C7, 
acC= <2!oC,...,27°C>, 


dz o dz” ° 
Ce =< oo, ee oho. 


When there is no possibility of confusion, we shall omit the  C and simply write 
aot) = <x'(t),...,2"Q> and Ca = <2"(0),..., 2% (>. 


Now let m be a Riemann metric on M. Then ||C’(|| = (€’(8), C°()) 1? is a 
continuous function. In faet, in terms of a chart, we can write 


IO! = HE al O)x" Ox’ . 


so that 


The integral 
fleola (9.11) 


is called the length of the curve C. It will be defined if ||C’(®|| is integrable over J. 
This will certainly be the ease if J and ||C€’(d]|| are both bounded, for instance. 
Note that the length is independent of the parametrization. More precisely, 
let ¢: J — I bea one-to-one differentiable map, and let C;} = Coy. Then at 
any 7 EJ we have 


that is, 
dt 
Cn) = ae): 
Thus 


C4 IE = l€"(e@)) | ; (9.12) 


de 
dr 
On the other hand, by the Jaw for change of variable in an integral we have 


freor= freon 
= f ici by (9.12). 


More generally, we say that a curve C defined on an interval J is piecewise 
differentiable if 
i) C is continuous; 
ii) J = £, U---U £, and ©, on each J;, is the restriction of a differentiable 
curve defined on some interval J} strictly containing 7;. 


ap 
ar 


402 DIFFERENTIABLE MANIFOLDS 9.9 


(Thus a piecewise differentiable curve is a curve with a finite number of “cor- 
ners”.) Ii C is piecewise differentiable, then ||C’(é)|| is defined and continuous 
except at a finite number of ¢’s, where it may have a jump discontinuity. In 
particular, the integral (9.11) will exist and the curve will have a length. 


Exercise. Let C be a curve mapping J onto a straight line segment in R* in a one-to- 
one manner. Show that the length of C is the same as the length of the segment. 


Let C': (0, 1] + R? be a curve with C(0) = 0 and C(1) = »v € R®. If we use 
the expression (9.7), we see that 
fle@la= [VG + COO) at 
> { [r'{e)| de 
fi 
> ip r'(t) dt 
= lel, 


with equality holding if and only if & =O and?’ > 0. Thus among all curves 
joining 0 to », the straight line segment has shortest length. 

Similarly, on the sphere, let C be any curve €:[0, 1] > S* with C(6) = 
(0, 0, 1) and C(1) = p ¥ (0,0, —1), and let @, = @(C(1)). Then 


i IC"()|| dt = i; V(@'®)? + sin? 6(e'(O)? dt 
1 
> a’(é)[ dé. 
> feel 
If we let ¢; denote the first point in [0, 1] where @ = @,, then 
ficola> fwolar f"eola> [roads oa, 
0 0 0 


with equality only if ¢’ =0 and ¢; = 1. Thus the shortest curve joining any 
two points on S? is the great cirele joining them. 

In both examples above we were aided by a very fortuitous choice of coordi- 
nates (polar coordinates in the plane and a kind of polar coordinates on the 
sphere). We shall see in Section 11, Chapter 13, that this is not accidental. We 
shall see that on any Riemann manifold one can introduce local coordinates in 
terms of which it is easy to deseribe the curves that locally minimize length. 


CHAPTER 10 


THE INTEGRAL CALCULUS ON MANIFOLDS 


In this chapter we shall study integration on manifolds. In order to develop 
the integral calculus, we shall have to restrict the class of manifolds under 
consideration. In this chapter we shall assume that all manifolds A¥ that arise 
satisfy the following two conditions: 


1) M is finite-dimensional. 
2) M possesses an atlas @ containing {at most) a countable number of 
charts; that is, @ = (Cs, O;)} 21,2... 


Before getting down to the business of integration, there are several technical 
facts to be established. The first two sections will be devoted to this task. 


1, COMPACTNESS 


A subset A of a manifold Jf is said to be compact if it has the following property: 


i) If {U.} is any collection of open sets with 
ACU, 
& 


there exist finitely many of the U,, say (.,,.-.-, (7, such that 
AcU,,u-:-uG%, 
Alternatively, we can say: 


ii) A set A is compact if and only if for any family {7,} of closed sets 
such that: 


ANN = 2, 


there exist finitely many of the F, such that 
ANFUN:--AF = &. 


The equivalence of (i} and (ii) ean be seen by taking U’, equal to the comple- 
ment of F,. 

In Section 5 of Chapter 4 we established that if 4 = U is an open subset of 
R”, then A CU ts compact af and only of A is a closed bounded subset of R”. 


403 


404 THE INTEGRAL CALCULUS ON MANIFOLDS 10.1 


We make some further trivial remarks about compactness: 
iii) If Ay,..., A, are compact, so is Ay U--+ UA, 


In fact, if {U.} covers Ay U--- U A,, it certainly covers each A;. We can 
thus choose for each j a finite subcollection which covers A;. The union of these 


subcollections forms a finite subcollection covering A; U+-+-U A; 
iv) If ¢: Af, — Af, is continuous and A C My, is compact, then ¢[A] is 
compact. 


In fact, if {U.} covers ¥[A], then {¥—1(U)} covers A. If the U, are open, 
so are the y—'(U,), since ¥ is continuous. We can thus choose ¢,,..., «-s0 that 


Ace "U.,) U--- Ue"), 


which implies that y{4] C U.,U++- UU. 

We see from this that if 4 = A; U---U A,, where each A; is contained 
in some W;, where (W,, @:) is a chart, and 8;(A;) is a compact subset of R”, 
then A is compact. In particular, the manifold Af itself may be compact. For 
instance, we can write S* as the union of the upper and lower hemispheres: 
S” = {x: 2"! > 0} U {2:2*t! < 0}. Each hemisphere is compact. In fact, 
the upper hemisphere is mapped onto {y : |ly/| < 1} by the map ¢; of Section 8.1, 
and the lower hemisphere is mapped onto the same set by ve. Thus the sphere 
is compact. 

On the other hand, an open subset of R®” is net compact. However, it can 
be written as a countable union of compact sets. In fact, if U C R® is an open 
set, let 

An = {2 EU: |le|| < wand p(x, dU) > 1/n}. 


It is easy to check that A, is compact and that 
UA, = U. , . 


In view of condition (2), we can say the same for any manifold A¥ under 
consideration: 


Proposition 1.1. Any manifold M satisfying (1) and (2) can be written as 


M=UA4,, 
t=1 
where each A; C M is compact. 
Proof. In facet, by (2) 


@ 
M= U U,, 
jut 
and by the preceding discussion each U; can be written as the countable union 
of compact sets. Since the countable union of a countable union is still eount- 
able, we obtain Proposition 1.1. 1 


10.2 PARTITIONS OF UNITY 405 


An immediate corollary is: 


Proposition 1.2. Let M be a manifold (satisfying (1) and (2)], and let {U,} 
be an open covering of Af. Then we can select a countable subcollection 
{U;} such that 

Uu; = M. 


Proof. Write M = UA,, where A, is compact. For each 7 we can choose finitely 
many U,.1, U,.2,..., Ur,x, 80 that 
A, Cc Uy U a aS U U.%,- 


The collection 
{Up x} enters" 


= Lives shy 


is a countable subcollection covering M. O 


2. PARTITIONS OF UNITY 


In the following discussion it will be convenient for us to have a method of 
“breaking up” functions, vector fields, ctc., into “little pieces”. For this purpose 
we introduce the following notation: 


Definition 2.1. A collection {g;} of C”-functions is said to be a partition of 
unity if 

1) g; > 0 for alla; 

ii} supp g,f is compact for all 7; 


jii) each +E M has a neighborhood V, such that V.Nsuppg: = @ 
for all but a finite number of 7; and 


iv) } g(x) = 1 forallee M. 


Note that in view of (iii) the sum occurring in (iv) is actually finite, since 
for any z all but a finite number of the g;(x) vanish. Note also that: 


Proposition 2.1. If A is a compact set and {g;} is a partition of unity, then 
A ‘supp g¢; = @ 
for all but a finite number of 7. 


Proof. In fact, each « € A has a neighborhood V, given by (iii). The sets 
{Vx}ze4 form an open covering of A. Since A is compact, we can select a finite 
subcollection {V,,...,V,} with ACV,U---UY,. Since each Vz has a 
nonempty intersection with only finitely many of the supp g;, so does their 
union, and so a fortiori does A. O 


t Recall that supp g is the closure of the sct {x: g(x) = O}. 


406 THE INTEGRAL CALCULUS ON MANIFOLDS 10.2 


Definition 2.2. Let {U.} be an open covering of M, and let {g;} be a parti- 
tion of unity. We say that {g;} is subordinate to {U,} if for every 7 there 
exists an «(j) such that 

supp 9; C Ux). (2.1) 


Theorem 2.1. Let {U.} be any open covering of M. There exists a partition 
of unity {9;} subordinate to {UU}. 


The proof that we shall present below is due to Bonic and Frampton.t 
lirst we introduce some preliminary notions, 
The function f on R defined by 


_fe“™ = if u>0, 
fu) {6 # K=O 


isC*. For u # O it is clear that f has derivatives of all orders. To check that f is 
C* at 0, it suffices to show that f(u) +0 as u—0 from the right. But 
f®(u) = Py(1/uje“", where P;, is a polynomial of degree 2k. So 


lim f*(u) = lim P,(s)e* = 0, 
ud oe = 
since e* goes to infinity faster than any polynomial. 
Note that f(u) > 0 if and only if u > 0. Now consider the function g? on R 


defined by 
gi(x) = f(x — a) f(b — x). 


Then ¢? is C* and nonnegative and 
giz) #0 ifandonlyif ac2z<b. 


More generally, if a= <a',...,a*> and b= <b!,...,b*>, define the 
function g® on R* by setting 

ge (z) = gai(x)gai(2*) - -- gat(z"), 
where « = <z!,,..,2">. Then g? > 0, g? €C*, and 


g(x) >0 ifandonlyif a) <a'<b',...,a* <2 <b. (2,2) 


Lemma. Let f;,...,f, be C*-functions on a manifold M, and let W = 
{x:a! < fi(z) < b',...,a" < f(z) < b*}. There exists a nonnegative 
C*-function g such that W = {x : g(x) > 0}. 


In fact, if we define g by 
g(x) = ga(fr(z),..., Se(z)), 
then it is clear that g has the desired properties. 


+ Smooth functions on Banach manifolds, J. Math and Mech. 15, 877-898 (1966). 


10.2 PARTITIONS OF UNITY 407 


We now turn to the proof of Theorem 2.1. 


Proof. For each « € M choose a U, containing x and a chart (l/, a) about a. 
Then «(Um U,) is an open set containing a(x) in R". Choose a and b such that 


a(x) € int ce and ce ca(UNU,). 
Let W, = a! (int [%). Then 
W,cU,. and W, is compact. (2.3) 
Also if z!,..., 2” are the coordinates given by a, 
W.= {y:a' < x'(y) < bI,...,a" < x*(y) < b"}. 
By our lemma we can find a nonnegative ('*-function f, such thal 
W.= {y:fely) > 0}. 


Since xz € W,, the {W.} cover Af. By Proposition 1.2 we can select. a countable 
subcovering {W,}. Let us denote the corresponding functions by f;; that is, 
if W; = W.z, we set fi = fe. 
Let. 
Vi = Wy = {x:filx) > 0}, 
V2 = {x: fox) > 0, fi(z) < 4}, 


V, = (2: f(z) > 0, filr) < 1/r,..., fr-1(@) < 1/r}. 
It is clear that V; is open and that V; Cc W;, so that, by (2.3), 


V; iscompact and Vy. (2.4) 
for some «— t(j). 
For each x € M lect q(x) denote the first integer q for which f,(x) > 0. 
Thus fp(z) = Oif p < g(x) and fyc2)(x) > 0. 
Let Ve = {y:forry(y) > Afgy(c)}. Sinee fys)(a) > 0, it follows that z € V, 
and V, is open. lurthermore, 


V.NV,=% if r>q(z) and I/r < 4Yqa(a). (2.5) 


According to the lemma, each set, V; can be given as V; = {x : 9;(x) > O}, 
where 9; is a suitable C*-function. Let ¢ = 32 9;. In view of (2.5) this is really 
a finite sum in the neighborhood of any x. Thus g is C7. Now Jyz)(z) > 0, 
since z € Vy.) Thus g > 0. Set 
— 4i, 

g 
We claim that {g;} is the desired partition of unity. In fact, (i) holds by our 
construction, (ii) and (2.1) follow from (2.4), (iii) follows from (2.5), and (iv) 
holds by construction. 0 


93 


408 THE YNTEGRAL CALCULUS ON MANIFOLDS 10.3 


3. DENSITIES 


If we regard R* as a differentiable manifold, then the law for change of variables 
for an integral shows that the integrand does not have the same transition law 
as that of a function under change of chart. For this reason we cannot expect 
to integrate functions on a manifold. We now introduce the type of object that 
we can integrate. 


Definition 3.1. A density p is a rule which assigns to each chart (U, a) of 
M a funetion p, defined on a(V) subject to the following transition law: 
If (W, 8) is a second chart of M, then 


Palv) = pg(B o a *(v)) |det Jg.o-1()| for vEa(UNW). — (3.1) 


If @ is an atlas of M and functions p,, are given for all (U;, o:) € @ satisfying 
(3.1), then the p., define a density p on M. In fact, if (U, e) is any chart of M 
{not necessarily belonging to @), define p, by 


pate) = pa,(are a-"v))|detJarart@)| if ve a(U NU). 
This definition is consistent: If yea(UNU) na«f(UnU;), then by (3.1), 


Pa, (aj ° a~*(v)) |det Fajo071(2)| 
= pa, (ex © aj! (a; © a7"(9))) det Jajeart(a; © a-'(v))||det Jo,eo-*(2)| 
= Pa, (a; 0 a—(v))|det. Faoa7! (v)] 


by the chain rule and the multiplicative property of determinants. 

In view of (3.1) it makes sense to talk about local smoothness properties of 
densities. We will say that a density p is €* if for any chart (U, a) the function 
fp. isC*, As usual, it suffices to verify this for all charts (U, a) belonging to some 
atlas, Similarly, we say that a density p is locally absolutely integrable if for 
any chart (U, a) the function p, is absolutely integrable. By the last proposition 
of Chapter 8 this is again independent of the choice of atlases. 

Let p be a density on Af, and let x be a point of Mf. It does not make sense 
to talk about the value of p at x However, (3.1) shows that it does make sense 
to talk about the sign of p at x, More precisely, we say that 


p>Ostz if pela(z)) >0 (3.2) 


for a chart (U, «) about x. Equation (3.1) shows that if p.(a(z)) > 0, then 
ps(B(x)) > O for any other chart (W, 8) about x. Similarly, it makes sense to 
say that p < Qatz,p > Oatz,orp ~ Oate. 


Definition 3.2. Let p be a density on M. By the support of p, denoted by 
supp p, we shall mean the closure of the set of points of If at which p does 
not vanish. That is, 

supp p = {z:p * Oat x}. 


10.3 DENSITIES 409 


Let ¢; and pz be densities. We define their sum by setting 
(01 — Pala = Pra + Pre (3.3) 


for any chart (UV, a). It is immediate that the right-hand side of (8.3) satisfies 
the transition law (3.1), and so defines density on Af. 
Let p be a density, and let f be a function. We define the density fo by 


(fO)a = faPa (3.4) 


Again, the verification of (3.1) is immediate in view of the transition laws for 
functions. 
It is clear that 


supp (p1 + p2) C supp p; U supp pe (3.5) 
and 
supp (fp) = supp f M supp p. (3.6) 
We shall write 
Pi Spoatz if pe—p;, 2 O0ate 
and 


Pi S po if py <poatallee M, 


Let P denote the space of locally absolutely integrable densities of compact 
support. We observe that P is a vector space and that the product fp belongs 
to P if f is a (bounded) locally contented function and p € P. 


Theorem 3.1, There exists a unique linear function { on P satisfying the 
following condition: If p € P is such that supp ep c U, where (U,a) is a 


chart of 1, then 
= o 3.7 
f p i as (3.7) 


Proof. We first show that there is at most one linear function satisfying (3.7). 
Let @ be an atlas of M, and let {g;} be a partition of unity subordinate to @. 
For each 7 choose an 7(j) so that 


supp 93 C Ui). 


Write p = 1-p = 3} g;-p. Since suppp is compact, only finitely many 
of the terms gp are not identically zero. Thus the sum is finite. Since f is linear, 


fe= [Xaw=Z fase. 
By (3.7), 
} ana ees OP ase: 


P= ¢ Pa; i)" (3.8) 
f x aoe "3 G 


Thus iP if it exists, must be given by (3.8). To establish the existence of f, 


Thus 


410 THE INTEGRAL CALCULUS ON MANIFOLDS 10.8 


we must show that (8.8) defines a linear function on P satisfying (3.7). The 
linearity is obvious; we must verify (3.7). 
Suppose supp p C U for some chart (U, a). We must show that 


f Pe Pa = pat (9 iP asgjy: 
a(t) J fey Higy) 


Since p = & g,p and therefore p, = ¥o (4;p)a, it suffices to show that 
[, Girla=f  WiPlas (3.9) 
acl} ag (BG) 
where supp gjo CU nN U;. By (3.1), 
(GiPa = (GP ee; © (a, ° a7") . |det Foam" 
so that (3.9) holds by the transformation law for integrals in R*. 1 
We can derive a number of useful properties of the integral from the for- 


mula (3.8): 
if pi < ps2, then fos < fox. (3.10) 


In fact, since g; > 0, we have (g;Pie S (GjPa)« for any chart (U, a). 
Thus (3.10) follows from the corresponding fact on R” if we use (3.8). 

Let us say that a set A has content zero if A C A, U---U A, where each 
A;is compact, A, C U; for some chart (U/;, «,;), and o;{A,) has content zero in 
R”. It is easy to see that the union of any finite number of sets of content, zero 
has eontent zero. Tt is also clear that the function e4 is contented. 


Let us eall a set BC Af contented if the function eg is contented. For any 
p &P we define fz p by 
f p= feop. (3.11) 
B 


[ome 


It follows from (3.8) that 


for any p € P if A has content zero. We can thus ignore sets of content zero for 
the purpose of integration. In practice, one usually takes advantage of this when 
computing integrals, rather than using (3.8). For instance, in computing an 
integral over S”, we can “ignore” any meridian: for example, if 


A= {rE8":x= ,0,...,4V1—- BER", 


f p= f p for any p. 
gr SA 


This means that we can compute fsap by introducing polar coordinates 
(Fig. 10.1) and expressing p in terms of them. Thus in S?, if U = S* — A and 
a is the polar coordinate chart on U, then 


Qr pr 
p= f f Pa €9 dy. 
Ss? 0 0 


then 


10.4 VOLUME DENSITY OF A RIEMANN METRIC All 


a(S — A} 


Fig. 10.1 


It is worth observing that if NV is a differentiable manifold of dimension less 
than dim Jf and 7 is a differentiable map of N — M, then Proposition 7.3 of 
Chapter 8 implies that if A is any compact subset of N, then ¥(A) has content 
zero in M. In this sense, one can ignore “lower-dimensiona] sets” when integrat- 
ing on M. 


4. VOLUME DENSITY OF A RIEMANN METRIC 


Let M be a differentiable manifold with a Riemann metric a. We define the 
density ¢ [=a(a)] as follows. For each chart (U, @) with coordinates z',..., x” 
let 


(a(a)) = [act] (2,2; @) |)" = laet Goon 
Fa{a(x)) = |det|(2 5 @), 55 @ = |det (g;;(2)) |". (4.1) 


(20.50) 


is the matrix whose zjth entry is the scalar product of the vectors 


Here 


a ts) 
jy) = and Ss 5), 
so that (in view of Exercise 8.1 of Chapter 8) 


o.(a(zx)) = volume of the parallelepiped spanned by (¢/éz*)(z) with 
respect to the Euclidean metric ( , )e,z on T2(A4). 


It is easy to sce that (4.1) actually defines a density. Let (W, 8) be a second 
chart about x with coordinates y',..., y". Then 


so that 
1/2 


og(8{z)) = 


a ((0.830) 


412 THE INTEGRAL CALCULUS ON MANIFOLDS 10.4 


€2 2)-5%-2¢6 2) 
ayk ay? Ty Oy* ay! \axt oxi 


Now 


for all k and t. We can write this as the matnx equation 


(er a)] ~ Fall e-ae)IESe 


i p]pise 
det (gs (2), = @)| det [#3] det Ea 
1/2 F 
ai|(2, 0.2; @)| act [2] 


= dq (a(x)) Idet 22, (x). 


If Mf is an open subset of Euclidean space with the Euclidean metric, then 
the volume density, when integrated over any contented set, yields the ordinary 
Euclidean volume of that set. In fact, ifz!,..., x” are orthonormal coordinates 
corresponding to the identity chart, then g:;(x) = 0 if t +7 and gi; = 1, so 


that oiqg = 1 and thus 
fea f=. 


More generally, Jet » be an immersion of a k-dimensional manifold M into 
R” such that ¢(M) is an open subset of a k-dimensional hyperplane in R*, and 
let m be the Riemann metric induced on M by ¢. Then, if ¢ denotes the corre- 
sponding volume density, fa @ is the k-dimensional Euclidean volume of ¢(A). 
In fact, by a Euclidean motion, we may assume that ¢ maps M into R* c R*. 
Then, since ¢ is an immersion and M is k-dimensional, we can use z!,... , x* 
as coordinates on M and conclude, as before, that ¢ is given by the function in 
terms of these coordinates, and hence that {4a = (e(A)). 

New let 9; and vg be two immersions of Af > R*. Let (U, a) bea coordinate 
chart on M with coordinates y',..., 4°. If my; is the Riemann metric induced 


by ;, then 
a8) _ (ae der 
ay® ayi/m, \ayé' ays 


i een ek 
aye’ dyis/me aut’ ays)’ 


where the scalar produet on the right is the Euclidean scalar product. Let o; 


so that 


H 
| | 


a3(A(z)) 


and 


10.4 VOLUME DENSITY OF 4 RIEMANN METRIC 


Fig. 10.2 


and ¢2 be the volume densities corresponding to m; and mg. Then 


1/2 


and 
1/2 


ms O¢2 Ove 
a eee (32 ; se2)) 


In particular, given an Z > 0, there isa K = K(k, n, L) such that if 


dea 02 
aex <L and | ayi 


then, by the mean-value theorem, 


<i forall ¢=1,...,%, 


: 


9e2 dy) 


oyt oy} 


Oya _ 91 


f.++4 aye ayF 


iF16 = Fae < x( 


413 


Roughly speaking, this means that if ¢; and ¢2 are close, in the sense that their 


derivatives are close, then the densities they induce are close. 


We apply this remark to the following situation. We let ¢, be an immersion 
of M into R® and let (W, a) be some chart of M with coordinates y',..., y*. 
We let U = W — C = UU), where C is some closed set of content zero and 
such that U;N Up = @ if il #1, For each / let 2 be a point of U; whose 
coordinates are <y},...,y%>, and forz = <y',..., y*> define v2 by setting, 


s é é 
galy’,.--, 4°) = vile) + OY — yD yt (21) 
if.e € U;. (See Fig. 10.2.) 


414 THE INTEGRAL CALCULUS ON MANIFOLDS 10.4 


Tf the U;’s are sufficiently small, then 


Og2 9) 


ay? ay? 


will be small. More generally, we could choose ye to be any affine linear map 
approximating ¢) on each U; We thus see that the volume of W in terms of the 
Riemann metric induced by ¢ is the limit of the (surface) volume of polyhedra 
approximating ¢(W). Here the approximation must be in the sense of slepe 
(i.e., the derivatives must be close) and not merely in the sense of position, 


The construction of the volume density can be generalized and suggests an 
alternative definition of the notion of density. In fact, let p be a rule which 
assigns to each z in AY a function, p,, on n tangent veetors in 7',(4%) subject to 
the rule 

p2(Aéi,..., AE) = [det A]p.(€;,..., En); (4.2) 


where £; € T,(Jf) and A: T,(M) — T,(M) is a linear transformation. Then 
we see that o determines a density by setting 


Kae = % (2; Woes )) (4.3) 


if (U, a) is a chart with coordinates u’,..., u”. The fact that (4.3) defines a 
density follows immediately from (4.2) and the transformation law for the 
8/au* under change of coordinates. 

Conversely, given a density p in terms of the pg, define p(d/du!,... , 0/du”) 
by (4.3). Since the vectors {0/du'\,_1,.2 form a basis at each z in U, any 
£),..., &, in F2(M) can be written as 
Q 


t= Bo 


(x) 2 


where B is a linear transformation of T,(4f) into itself. Then (4.2) determines 
p(£i,-.., En) as 
(£1,.+-, &) = [det Blog{a(z)). (4.4) 


That this definition is consistent (i.e., doesn’t depend on a) follows from (4.2) 
and the transformation law (3.1) for densities. 


EXERCISES 


4.1 Let M = §' x S! be the torus, and let ¢: M — Rt be given by 


alo y(A), 02) = cos 61, 
2? o y(61, 62) = sin 61, 
x3 0 (61, 82) = 2 cos Bo, 
xt o p(61, G2) = 2sin bo, 


10.4 VOLUME DENSITY OF A RIEMANN METRIC 415 


where z!,..., 2* are the rectangular coordinates on R* and 6!, 6? are angular coordi- 
nates on M. 


a) Express the Riemann metri¢ induced on M by ¢ (from the Euclidean metric 
on R#) in terms of the coordinates 6', 62, (That is, compute the g,;(0!, 67}.] 
b) Whatis the volume of df relative to this Riemann metric? 


4,2 Consider the Riemann metric in- 
duced on S$! x S! by the immersion ¢ into 
E? by 


£0 y(u,v) = (a — cos u) cos», 
yo plu, v) = (2 — cos u) sint, 
2° ou, #) = sin u, 


where uw and v are angular coordinates and 
a > 2. What is the total surface area of 
S! x S! under this metric? 


4.3 Let ¢ map 4 region U of the zy-plane 
into E* by the formula 


pit, ¥) = G; y, F(z, y)), 
so that ¢(U) is the surface z = F(z, y). (See Fig. 10.3.) Show that the area of this 


surface is given by 
orf 
Ale nda 
i viz (+) ¢ zy + (3): 


4.4 Find the area of the paraboloid 
=a?+y? for z?+y? <1. 
4.5 Let UC R?, and let g: U — ES be given by 


Fig. 10.3 


glu, v) = (x(a, »), ylu, v), z(u, v)), 


where 2, ¥, z are rectangular coordinates on E°. Show the area of the surface ¢(U) 
is given by 


[ Oz dy dx dy ee (2 Gz dy dz io ox dz dx dy e 
u du dv av Ou du dy dv du Ou dy dv du 
4.6 Compute the surface area of the unit sphere in E%. 


4.7 Let M, and Me be differentiable manifolds, and let o be a density on Mz 
which is nowhere zero. For each density p on M1 X Mo, each product chart (U; X U2, 
a X a2), and cach x2 € Ve, define the function pj4,(-, 22) by 


Pla, (1, £2)¢u(co(x9)) = Pay xXaz (01, o2(22)) 
for all vy € ay (Uy). 
a) Show that pia,(v1, 2) is independent of the chart (U2, az). 
b) Show that for each fixed r2 € Me the functions pia,(-, 22) define a density on M1. 
We shall call this density p1(<2). 


416 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5 


¢} Show that if p is a smooth density of compact support on J%1 X Af, and ¢ is 
smooth, then p1(z2) is a smooth density of compact support on Afi. 
d} Let p be asin (c). Define the function F, on M2 by 


F (22) = if pila). 


Sketch how you would prove the fact that #, is a smooth function of compact 
support on Afe and that 


[ p= Fi: o. 
My xXMy Mo 


5. PULLBACK AND LIE DERIVATEYES OF DENSITIES 
Let o: Af, — I4q be a diffeomorphism, and let p be a density on Mf. Define 
the density »*p on Af, by 
¢*p(&1, tay En) = plgex Ei, sey geen) (5.1) 
for £; € T,(17,) and gy = ¢y2. To show that »*p is actually a density, we must 
check that (4.2) holds for any linear transformation A of T',(M,). But 
yg p(Aki,..-; Abn) = pleeAti,..., prAbn) 
= plyxAge enti, ..., pxdgx extn) 
= |det er Age || pleat, ..., Grbn) 


= [det Ale* p(k, re En) 
which is the desired identity. 
Let (U, @) and (W, 8} be compatible charts on Af, and Jf2 with coordinates 
a',...,u” and w',...,w", respectively. Then for all points of U we have, 


by (4.3), 
* _ a a\_ =) (= a 2) 
e pala) = P(r ger 905%) = det (2 P\aut?* "au 


, . 
=) a) 
= (me 

In other words, we have 

("Pla = |det Jpogoo-p(8 og oa *(-)). (5.2) 


The density ¢*p is called the pullback of p by ¢*. Ii is clear that 
e*(p1 + pe) = 9*(p1) + ¢*(P2) 


1 


pa(B e (-)). 


and that 
v*(fp) = e*ife*{p) 


for any function f. 
It follows directly from the definition that 


supp ¢*p = g—'[supp p]. 


10.5 PULLBACK AND LIE DERIVATIVES OF DENSITIES 417 


Proposition 5.1. Let y»: Jf; — Mz be a diffeomorphism, and let p be a 
locally absolutely integrable density with compact support on M>. Then 


fe’o = fo. (6.3) 
Proof. It suffices to prove (5.3) for the case 


supp p C ¢(U) 


for some chart (U', a) of AZ, with o(V) C W, where (W, 8) is a chart of Afo. 
Tn fact, the set of all such ¢{U/) is an open covering of Afe, and we can therefore 
choose a partition of unity {g,;} subordinate to it. If we write ep = >> g;p, then 
the sum is finite and each g;p has the observed property. Since both sides of (5.3) 
are linear, we conclude that it suffices to prove (5.3) for each term. 

Now if supp p C ¢(U), then 


p= f os = p 
f AW) . Boe(U) , 


* = * 
fee eute Pa 
=f (Bove a)idet Jpoyea 
a(t} 


and 


Spee?) 
thus establishing (5.3). 0 


PS; 


Now let yg; be a one-parameter group on Jf with infinitesimal generator X. 
Let p be a density on AZ, let (L/, «) be a chart, and let W be an open subset of U 
such that ¢(W) C U for all [t} < €. Then 


(et Pav) = pal Pele, t)) |det (°*:) 
(v,t) 


where @,(v, 2) = wo go a '{v) and (86,/év),»,2) is the Jacobian of v+> Sy(v, é}. 
We would like to compute the derivative of this expression with respect to ¢ at 
t= 0. Now ©,(v, 0) = 2, and so 


det (S) = 1. 
du f(v,0) ; 


Consequently, we can conclude that 


det (=) >0 
dv Jon 


for ¢ close to zero. We can therefore omit the absolute-value sign and write 


for vEa(W), 


d(vip)a' = dp,(Bz) 
dt simpy di 


d od, 
rer + Pal?) di (aet 7a 


t= 


A418 THE INTEGRAL CALCULUS ON MANIFOLDS 10.5 


We simply evaluate the first derivative on the right by the ¢hain rule, and get 


ob, 
dp. (%) = dpa(Xa(v)). 
In terms of coordinates x!,... , 2”, we can write 


apa (4,(2, t)) er Ope 7 
di Se Gat * 
A x a Ns acres 
To evaluate the second term on the right, we need to make a preliminary 
observation. Let A() = (a;;(}) be a differentiable matrix-valucd function of f 
with A(0) = id = (4). Then 


a(det A(D) 
dt 


Now a:;(0) = 1 and a,;(0) = 0 @ # j). To say that A is differentiable means 
that each of the functions @;;(2) is differentiable. We can therefore find a constant 
K such that ja;;(0)| < Kt] G # ) and ja.(t} — 1| < Kt}. In the expansion of 
det A(t), the only term which will not vanish at least as ¢? is the diagonal product 
@11(t}- ++ @n2 (6). In fact, any other term in 3 + @1;,(Q-- + a@ns,(2) involves at 
least. two off-diagonal terms and thus vanishes at least as ¢7. Thus 


im (det A(t) — 1). 


M 


£ (det AO) = Um (0130) + ++ eal) — 1) 
t eo £ 


a43(0) +--- + ahn(O) 
= tr A’(0). 
If we take A = 06,/dv, we conclude that 


df, ,8%.\  , 8Xq _ ws AXE 
4 (act #2) = wks = 5 Me. 


at av 
Thus 
det pe Payi ,  AXe ya, yi 
di =. oe oxi Xa t+ Pu Ort —_ a 373 PaXa). 
We repeat: 


Proposition 5.2. Let », be a one-parameter group of diffeomorphisms of W/ 
with infinitesimal generator X, and let p be a differentiable density on df. 
Then 


* 
t=0 t 
exists and is given locally by 


(Dxp)x = > (Paks) 


if X = <X},...,X®> on the chart (U, a). 


10.6 THE DIVERGENCE THEOREM 419 
The density Dyp is sometimes called the divergence of <X,p> and is 
denoted by div <X,p>. Thus div <X,p> = Dyp is the density given by 


& 
ax? 


Now let p be a differentiable density, 
and let A be a compact contented set. 
Then 


p= f €9,(A)P 
La uo 


=f gt (Cnt Ay?) 
Mf 

= [Cel eeua) (eo) 
= feavt(o) 


= [ eto. 


4 X 
1 1 i: 
1(f p— / ) = [ = (yip — p). : Fig. 10.4 
t \Jotad A at 


Using a partition of unity, we can easily see that the limit under the integral 
sign is uniform, and we thus have the formula 


a(f ») = [ so= f div <x,o>. 
dit \Jecdy /lt=0 A A 


6. THE DIVERGENCE THEOREM 


(div <X,p>)a= D5 (Kapa) on (Uy). 


Let ¢ be a flow on a differentiable manifold M with infinitesimal generator X. 
Let p be a density belonging to P, and let A be a contented subset of M. Then 
for small values of t, we would expect the difference {,.4)9 — fa p to depend 
only on what is happening near the boundary of A (Fig. 10.4). In the limit, 
we would expect the derivative of Toa ap o& t=0 Gwhich is given by 
fa div <X, p>) to be given by some integral over 24. In order to formulate 
such a result, we must first single out a class of sets whose boundaries are suffi- 
ciently nice to allow us to integrate over them. We therefore make the following 
definition: 


Definition, Let Af be a differentiable manifold, and let D be a subset of UM. 
We say that D is a domain with regular boundary if for every x € M thereisa 
chart (U,a) about z, with coordinates zi,...,2%, such that one of the 
following three possibilities holds: 

i} UND=2; 

ii) UCD; 

ili) af WD} = afU) ON fv = <9),...,o7> ER": 2” > OF. 


420 THE INTEGRAL CALCULUS ON MANIFOLDS 10.6 


Note that if « ¢ D, we can always find a (U, «) about « such that (3) holds. 
If « €int D, we can always find a ehart (U, a) about x such that (ii) holds. 
This imposes no restrictions on D. The crucial condition is imposed when 
x €@D. Then we cannot find charts about z satisfying (i) or (ii). In this case, 
(iii) implies that a{f/ M dD) is an open subset of R*—' (Fig. 10.5). In fact, 
af N aD) = {v €a(l’):v" = 0} = a(V) NR®!, where we regard R*—! as 
the subspace of R” consisting of those vectors with last component zero. 


a(Uina Dy 


Fig. 10.5 


Let @ be an atlas of Jf such that each chart of & satisfies either (i), (ii), or 
(ui). For each (U, a) € @ consider the map a | dD: UNaD—a Rc R* 
[Of course, the maps a [ 9D will have a nonempty domain of definition only for 
charts of type (iii).] We claim that {(U aD, a [ dD)} is an atlas on aD. In 
fact, let (U, a) and (W, 8) be two charts in @ such that UN WOAédD # @. 
Let x'!,...,2" be the coordinates of (L’, a), and let y’,...,y" be those of 
(W, 8). The map 6° a is given by 


<2’, 1 Bm be <y'(2', sees Bye eey y*(x', ee ae Por 
On a(U N W aD), we have <* = 0 and y* = 0. In particular, 
y" (x, Shee : Aaa 0) = 0, 


and the functions y'(x!,...,2771,0),..., 
y" (ct, ..., 27}, 0) are differentiable. This 
shows that (8 [ 8D) oa | aD)~! is differen- 


tiable on e(f? (dD). We thus get a manifold 
structure on 3D. se 


It is easy to see that this manifold struc- 
ture is independent of the particular atlas of AZ 
that was chosen. We shall denote by ¢ the map 
of 3D — M which sends each x € OD, regarded Fig. 10.6 
as an element of Af, into itself. It is clear that 
tis a differentiable map. (In fact, (UN 8D, a | aD) and (U, a) are compatible 
eharts in terms of which ao. (a [ 8D)~' is just the map of R™—! > R*.) 
Let x be a point of oD regarded as a point of M, and let & be an element 
of T,(M). We say that € points into D if for every curve C with C’(0) = 6, 
we have C(t) © D for sufficiently small positive ¢ (Fig. 16.6). In 


10.6 THE DIVERGENCE THEOREM 421 


terms of a chart (U, a) of type (iii), let £. = <£!,..., ">. Then it is clear 
that & points into D if and only if &* > 0. Similarly, a tangent vector ~ points 
out of D (obvious definition) if and only if &* < 0. If &? = 0, then ¢ is tangent 
to the boundary —it lies in s«7,(@D). 

Let be a density on M and X a vector field on AZ. Define the density px 
on dD by 


Px(£is- ++, Ena) = Plewbiy ++, bea, X(x)) for & ET(@D). (6.1) 


It is easy to check that (6.1) defines a density. (This is left as an exercise for 
the reader.) If (U, «) is a chart of type (iii) about z and X, = <X’,...,X">, 
then applying (4.3) to the chart (U ™ 4D, a [ aD) and the density px, we see 
that 


a ) 
(oxderow = 0 (525+ 1gehz+X): 


Let A be the linear transformation of T,(/7) given by 


aot apt? Agger ager’ | Aga X 
The matrix of A is 
i 0 x} 
0 1 0 Xx? 
ee 
0 ... ... 1 9 
OD, tae. Vet xX” 
and therefore |det A] = [X*|. Thus we have 
{oxleraen = |X"|pa _— at all points of a(U nm aD). (6.2) 


We can now state our results. 


Theorem 6.1 (The divergence theorem).+ Let D be a domain with regular 
boundary, let p & P, and let X be a smooth vector field on AZ. Define the 
function €x on dD by 
1 if X(zx) points out of D, 
€x(x) = 0 if X(x) is tangent to aD, 
—1I if X(zx)} points into D. 


Then 
div <X,p> = ‘ 6.3 
[div <X p> = fi exex (6.3) 
Remark. In terms of a chart of type (iii), the function €y is given by 
€xy = —sgn X”. (6.4) 


{ This formulation and proof of the divergence theorem was suggested to us by 
Richard Rasala. 


422 THE INTEGRAL CALCULUS ON MANIFOLDS 10.6 


Fig. 10.7 Fig. 10.8 
R 
~ p 
supp (X*p)e 
_R Fig. 10.9 Fig. 10.10 


Proof. Let @ be an atles of M@ each of whose charts is one of the three types. 
Let {g;} be a partition of unity subordinate to @. Write p = > g¢,p. This is a 
finite sum. Since both sides of (6.3) are linear functions of p it suffices to verify 
(6.3) for each of the summands g;o. Changing our notation (replacing g,p by p), 
we reduce the problem to proving (6.3) under the additional assumption 
supp p C U, where (U, a) isa chart of type (i), (ii), or (iii). There are therefore 
three cases to consider. 


CASE I, suppecU and UN D= @. (See Fig. 10.7.) Then both sides of 
(6.3) vanish, and so (6.3) is correct. 


CASE II. suppoc U with & cint D. (See Fig. 10.8.) Then the right-hand 
side of (6.3) vanishes. We must show that the left-hand side does also. But 


; ip 7 8(X'pq) _ a(Xipa) | 
[air <X,e> = [ aw <X,p> = > ae aah ot 


Now each of the functions X‘p, has its support lying inside a(V). Choose some 
large R so that a(t’) C LJ&x. We can then replace fi, by JOR, We extend 
its domain of definition to all of R® by setting it equal to zero outside a{U). 
(See Fig. 10.9.) Writing the integral as an iterated integral and integrating with 
respect to x‘ first, we see that 


[ OX'pa 
a(U'} dxt 


mf Xo By) — Xba ey HR Deadet det dat data” = ’ 


This last integral vanishes, because the function X‘p, vanishes outside a(U). 


16.6 THE DIVERGENCE THEOREM 423 
CASE III. supp p is contained in a chart of type (iii). (See Fig. 10.10.) Then 


t 
f div <X, p> =i div <X,p> => OX Pa. 
D BN 


aqpnuy xt 


Now 
alU ND) = alV) 1 {ve > O}. 


We can therefore replace the domain 


> 
of integration by the rectangle = 
Cs2 ag pop. (See Fig. 10.11.) 
For 1 <7 < ” all the integrals in <~4,... *"=0 
the sum vanish as before. For a 
t= n we obtain 


f div <X,p> = — fo ae. Fig. 10.11 


If we compare this with (6.2) and (6.4), we see that this is exactly the assertion of 
(6.3). O 


If the manifold M is given a Riemann metric, then we can give an alternative 
version of the divergence theorem. Let dV be the volume density of the Riemann 
metric, so that 


dV(éy,..., En) = fdet ((&, &))|"?7, & © TN), 


is the volume of the parallelepiped spanned by the é; in the tangent space (with 
respect to the Euclidean metric given by the scalar product on the tangent space). 

Now the map ¢ is an immersion, and therefore we get an induced Riemann. 
metric on 6D. Let dS be the corresponding volume density on 9D. Thus, if 
{fs} i-1,...n-1 are n — 1 vectors in T,(9D), d8(£1,..., &-1) 1s the (n — 1)- 
dimensional] volume of the parallelepiped spanned by txé,...,teén—1 in 
teZ'(6D) C T,(M). For any x € 8D let m € T,(A1) be the vector of unit length 
which is orthogonal to t*7',(@D) and which points out of D (Fig. 10.12). We 


447, (AD) (4 T (aD) 


Fig. 10.12 Fig. 10.13 


424 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 


clearly have 
dS(é1, heey En—1) = dV (tet, ..., began, 2). 

For any vector X(z) € T,(M) (Fig. 10.13) the volume of the parallelepiped 
spanned by &),..., £21, X(x) is |(X (xz), n)[dS(£,,..., E71). [In fact, write 
X(x) = (X(z),n)n + m, 

where m & txT'(9D).} If we compare this with (6.1), we see that 
dVx = |(X,n)\d8S. (6.5) 
Furthermore, it is clear that 
e(x) = sen (X(x), n). 
Let p be any density on M. Then we can write 
p= fav, 
where f is a function. Furthermore, we clearly have py = f dV x and 
div <X,p> = div <X,fadV>. 

We can then rewrite (6.3) as 

f div <X,fdV> = i f+ (X,n) a8. (6.6) 

D aD 


7. MORE COMPLICATED DOMAINS 


For many purposes, Theorem 6.1 is not quite sufficiently broad. The trouble is 
that we would like to apply (6.3) to domains whose boundaries are not com- 
pletely smooth. For instance, we would like to apply it to a rectangle in R’. 
Now the boundary of a rectangle is regular at all points except those lying on an 
edge (i.e., the intersection of two faces). Since the edges form aset “of dimension 
n — 2”, we would expect that their presence does not invalidate (6.3). This is 
in fact the case. 

Let M be a differentiable manifold, and let D be a subset of M. We say 
that D is a domain with almost regular boundary if to every x € Jf there is a 
chart (’, «) about «, with coordinates z!,..., <%, such that one of the following 
four possibilities holds: 

i) UND=@; 

ii) UCD; 

iii) aU 0 D) = aC) N fv = <v!,...,0%> ER v0" > OF; 

iv) aU ND) = aU) nN fv = <v',...,0°> ER: > 0,..., 0° > O. 

The novel point is that we are now allowing for possibility (iv) where k < n. 
This, of course, is a new possibility only if > 1. Let us assume 2 > If and see 
what (iv) allows. We can write a(U/ 9 €D) as the union of certain open subsets 
lying in (x — 1)-dimensional subspaces of R*—', together with a union of 
portions lying in subspaces of dimension 2 — 2. 


10.7 MORE COMPLICATED DOMAINS 425 


J 


Fig. 10.14 


In fact, fork < p < n let 
HE = {v:0* > 0,...,v? = 0,07 t! > 0,...,0% > O}. 


Thus H* is an open subset of the (% — 1)-dimensional subspace given by 
v? = 0. (See Fig. 10.14.) We can write 


o(U NaD) C a(U) N (HEU AE, U--- U AS) US}, 


where S is the union of the subspaces (of dimension n — 2) where at least two 
of the v? vanish. 


< 


Fig. 10.15 


Observe that if « € Um aD is such that a(x) € H* for some p, then there is a 
chart about x of type (ii). In fact, simply renumber the coordinates so that v? 
becomes v”, that is, map R” ¥ R” by sending <v!,...,u"> > <w!,..., w™>, 
where 


wi = 2° for it < p, 
wow =vt! for p<i<n, 
wm” = py? 


Then in a sufficiently small neighborhood U?! of z the chart (U!, ¢° a) is of 
type (iii). (See Fig. 10.15.) 


426 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 


We next observe the set of x € aD having a neighborhood of type Gii} forms 
a differentiable manifold. The argument is just as before. The only difference 
is that this time these points do not exhaust all of dD. We shall denote this 
manifold by aD. Thus dP is a manifold which, as a set, is not aD but only the 
“regular” points of 9D, that is, those having charts of type (ti). 


Theorem 7.1 (The divergence theorem). Let Mf be an x-dimensional 
manifold, and let DCM be a domain with almost regular boundary. 
Let dD be as above, and let i be the injection of 2D + M. Then for any 
p € P we have 


i} div <X,p> — Pe Sune: (7.1) 
B ab 


Proof. The proof proceeds as before. We choose a connecting atlas of charts 
of types (i) through (iv) and a partition of unity {g;} subordinate to the atlas. 
We write p = > g;p and now have four cases to consider. The first three cases 
have already been handled. 

The new case arises when p has its support in U, where (U, a) is a chart of 


type (iv). We must evaluate 
i: > aX Le 7 
a(UND) dz? 


The terms in the sum corresponding to ¢ < & make no contribution to the 
integral, as before. Let us extend X'p, to be defined on all of R” by setting it 
equal to zero outside a(U’), just as before. Then, fork < i < x we have 


[ aX'o, [ ax'p, 
> = =. 
andy ox B ax 


where B = fy:e* > 0,...,0" > 0}. Writing 
this as an iterated integral and integrating first PS 
with respect to x’, we obtain 


X'pe _ f a 
i ort = Pi ay 


where the set A; C R®—’ is given by 
Ag= {<0 6g sn OO BP 2 0 8S 


Fig. 10.16 


Note that A; differs from H* by a set of content zero in R"—! (namely, where at 
least one of the v' = 0 fork = 1 < n). Thus we can replace the A; by the H? 
in the integral. Summing over k <7 < n, we get 


aX'pa “ : 
—= X* 
—_ 2 Oat > Ht ee 


which is exactly the assertion of Theorem 7.1 for case (iv). 0 


10.7 MORE COMPLICATED DOMAINS 427 


Fig. 10.17 Fig. 10.18 


We should point out that even Theorem 7.1 does not cover all cases for which 
it is useful to have a divergence theorem. For instanee, in the plane, Theorem 7.1 
does apply to the case where D is a triangle. (See Fig. 10.16.} This is because 
we can “stretch” each angle to a right angle (in fact, we can do this by a linear 
change of variables of R*). (See Fig. 10.17.) 

However Theorem 7.1 does not apply to a quadrilateral such as the one in 
Fig. 10.18, since there is no C!-transformation that will convert an angle greater 
than 7 into one smaller than a (since its Jacobian at the corner must carry 
lines into lines). Thus Theorem 7.1 doesn’t apply directly. However, we can 
write the quadrilateral as the union of two triangles, apply Theorem 7.1 to each 
triangle, and note that the contributions of each triangle coming from the 
common boundary cancel each other out. Thus the divergence theorem does 
apply to our quadrilateral. 

This procedure works in a quite general context. In fact, it works for all 
cases where we shall need the divergence theorem in this book, whether Theorem 
7.1 applies directly or we can reduce to it by a finite subdivision of our domain, 
followed by a limiting argument. We shall not, however, formulate a general 
theorem covering all such cases; it is clear in each instance how to proceed. 


EXERCISES 


In Euclidean space we shall write div Y instead of div <.X,p> when gp is taken to 
be the Euclidean volume density. 


7.1 Let x, y, 2 be rectangular coordinates on E*. Let the vector field X be given by 


_2f 9@ ) i) 
X=r (e2+ ve+22) ’ 
where r? = 27+ y?-+ 27, Show directly that 
f (X,n) dA = i div X 
5 B 


by integrating both sides. Here B is a ball centered at the origin and S is its boundary. 


428 THE INTEGRAL CALCULUS ON MANIFOLDS 10.7 


7.2 Let the vector field Y be given by 
Y = Y-n, + Yone + Yyny 
in terms of polar “coordinates” r, 9,,.¢ on (3, where n,, n.«and 7, are the unit vectors in 
the directions 4/dr, d/a0 and a/dy respectively. Show that 


1 


div F = ir eee 
resin ¢ 


G2. : Gs ws ae 

(ne sin y V+ ag Ye) + 30 sing ro} . 

7.3 Compute the divergence of a vector field in terms of polar coodrinates in the 
plane. 


7.4 Compute the divergence of a vector field in terms of cylindrical coordinates 
in ES, 


7.5 Leto be the volume (area) density on the unil sphere S?. Compute div oX 
in terms of the coordinates @, y {polar coordinates) on the sphere. 


CHAPTER 11 


EXTERIOR CALCULUS 


Let Af be a diffcrentiable manifoid and let w be a linear differential form in Jf, 
For any differentiable curve C: [a,b] ~ AZ we ean consider the integral 
i (C’(1), wey) dé. Let [e, d]  [@, 6] be a differentiable map given by s — ¢(s). 
The curve B: [c, d] — 1 given by B(s) = C(i(s)) satisfies 

Bs) = t(s)C'(t6s)). 
Thus if é‘(s) > 0 for all s, 


[0 B®), ea) ds = [" CW, ae) dt 


Thus a linear differential form is something we can integrate over “oriented” 
curves of AZ and is independent of the parametrization. In this chapter we shall 
introduce objects which ean be integrated over “oriented é-dimensional surfaces” 
of AY and study their properties. 


1. EXTERIOR DIFFERENTIAL FORMS 


We defined a linear differential form to be a rule which assigns an clement of 
T?(M) to each 2 € M. We can regard T7(M) as @'(T,(M)). In view of this, 
we make the following generalization of this definition. By an exterior differ- 
ential form of degree q on AZ we mcan a rule which assigns an clement of 
@?(T.(Af)) to cach « © AY. If @ is an exterior form of degree q and (U, a) isa 
chart, then, since a identifies each TCT) with V for « € 1, we obtain an 
Q7(V)-valued function, w,, on a(f{!) defined by 


wa(v)(Ea,. 8) = ola) (E,..., 8) fv = aa) ond ’,..., | TCA). 


It is easy to write down the transition laws. In fact, if (W, 8) is a second 
chart, we have 


wg (BC) ) (ES, .- . , EB) — we) (E",.., £9) = walale)) (£2,..., 22) 
or, since fg = J gcq— {a(x))(£,) for £ € T, (Al), we see that 
walt) (ed, ..., €2) = wp (Be a" (v)) (Spee @) Eds «oy Face @e4). A) 


In order to write (1.1) in a less cumbersome form, we introduce the following 
notation. Let Vy and V2 be veetor spaces, and iet i: Vy 4 Ve be a linear map. 
429 


430 EXTERIOR CALCULUS 11.1 


We define @?(2) to be the linear map of @?(1"2) — @?(V,) given by 
QP (I) (2e)(v,,..., vp) — wey), ..., Up)) 


for all w € @"(V2) and vw,...,%) € Vy. Note that under the identification of 
@'(¥)with ¥* the map @)(2) coiucides with the map [*: Vf -> V7. Note also 
that if w, € @(V¥o) and we € @%(V2), then 


APD, A AMDuwe = GPFID (we, A we). (1.2) 


This follows directly from the definitions. Also, if 2): V¥, — Vzgand lg: V2-> ¥3, 
then 
@Ply oly = @PU)) o A @,). (1.3) 


It is clear that if { depends ditferentiably on some parameters, then so does 
Q(t) for any p. 
We can now write (1.1) as 


a(t) = AMT gnu-'(v) )wg(B eo a—'(v)). (1.1) 


Tt is clear from (1.1’) that it is consistent to require that w, be a smooth 
function, We therefore say that w is a ymooth differential form if all the functions 
wy are C* on a(t") for all charts (€', a). As usual, it suffices to verify this for all 
charts in an atlas. We let A?(A/) denote the space of all smooth exterior forms 
of degree g. 

Let @, € A?(M) and w; € A\7(M). We define the exterior (p + g)-form 
w1 A we by 

(wy, A we){2) = w(x) A we(x} forall 2 € Af, 
It is easy to check that w; A w2 isasmooth (p -+ g)-form. We thus get a multi- 
plication on exterior forms. To make the formalism complete, it is convenient 
to denote the space of differentiable functions on Af by /\®(AZ) and to denote 
the product of a function f and a p-form w by fw or f A w. This product is 
given by 

Cf A w(x) = (fw)(z) = fix)o(x) forall 2weM. 

We have thus defined, for all O < p < nand 0 < ¢ < 7, a multiplication 
sending w, € A?(M) and w. Ee AMM) into w, A wz € APTUM) (where 
w, Awe. =O0if p+g¢g>n= dim M). The rules for the A-product on anti- 
symmetric tensors carry over and thus, for instanee, 


wy A we = (—1)"%we A ay if w, € A\?(M) and w. € ACM), 
wr A {ae A we) = (@, A wa) A wa, 
a1 A (we + @3) = @ A @2 +1 A as, 

and so on. 


Let 4%, and Mz, be differentiable manifolds, and let ¢: WM, —- M. bea 
differentiable map. For cach w © /\7(M 2) we define the form ¢*w & A%(M 1} by 


y*u(r) = B7(y42)(w(e(2))). (1.4) 


111 EXTERIOR DIFFERENTIAL FORMS 431 


It is easy to check that ¢*w is indeed an element of A®(Af,), that is, it is a 
smooth q-form. Note also that (7.5) of Chapter 9 is a special case of (1.4)—the 
case g = 1. (If we make the convention that @°(2) = id, then the case g = 0 
of (1.4) is the rule for pullback of functions.) 

It follows from (1.4) that ¢* is linear, that is, 


¢*(w1 + We) = ¢*(w1) + ¢* (we), (1.5) 
and from (1.2) that 
p*(w1 A 2) = o*(w1) A * (wo). (1.6) 


If v is a one-parameter group on a manifold Af with infinitesimal generator 
X, then we can show that the 


exists for any w € /\%(Af). The proof of the existence of this limit is straight- 
forward and will be omitted. We shall derive a useful formula allowing a simple 
calculation of Dyw in Section 3. 

Let us now see how to compute with the /\°(M) in terms of local coordinates. 
Let (U, a) be a chart of Af with coordinates z!,...,2*%. Then dz? e A'(U) 
(where by A°%(U) we mean the set of differentiable g-forms defined on UV). 
For any 7,,..., 7 the form dr" A --- A dx? belongs to A*(UY), and for every 
z & U the forms 

{(de" A+++ A dx®)(a)} iy <.n<ty 


form a basis for @¢(7',(11)). From this it follows that every exterior form w 
of degree g which is defined on U can be written as 


i dz" A+++ A dk, (1.7) 


where the a’s are functions; that is, 


wz) = aay niga) (de A +++ A dx'*)(z) 
Tye Sty 
for allz €U. It is casy to sce that w € A®(U) if and only if all the fune- 
tions @;,,.._,4, are C*-funetions on U. 
If (W, 8) is a second chart with coordinates y!,..., y" and 


wo = F by,.....5, dy As: + A dys, (1.8) 


then it is casy to compute the transition law relating the b’s to the a’s on U Nn W. 
In fact, on U M W we have 


F } fe 
gw=-d ‘e dri, (1.9) 


where y? = y(x',...,2"). Then all we have to do is to substitute (1.9) into 
(1.8) and collect the coefficients of dz" A -+- A dx’. For instance, if g = 2, 


432 EXTERIOR CALCULUS 11,1 


then we have 


w= oe 65,3, dy?! A dy”? 


di<ia2 
— ~, bj,32 (= dx + ee dx” } A Axl dx’ + aa dx"). 


If we collect the coefficients of de” A dx? (remember the A-multiplication ix 
anticommutative), we get 


ay? ay? ay’? ay? ; 2 
w= &[Z bis, (SE ee — dx" A dx", 


ip<ty LF) <9 
Thus 
ays dy”? 
nee Sart (1.10) 
Site ay”? ay? 
Oxt2 Aye 


Although (1.10) looks a little formidable, the point is that all one has to remem- 
ber is (1.9) and the law for A-niultiplication. For general g the same argumerl 
LIVES 


ay? ay’ 
axu av 
Biginty= De dyed Ob] | oe (1.11) 
-w ay’? 
Arte Aate 


The formula for pullback takes exactly the same form. Let g: If, — al. 
be a differentiable map, and suppose that (U7, a) and (W, 8) are compatible 
charts, where x',..., 2” are the coordinates of (U, a) and y!,..., y® are thos: 
of (W, 8). Then we get that ¥' ¢ ¢ are functions on (7 and can thus be written as 

eS Pen. 32"). 


Since ¢* dy? = d(y? 0 v), we have 


; ye 
eddy) => OW art, (1.12) 
ox 
If 
w= DD bas dy? No: A ay’ © AU) 
Fis <q 
then, by (1.5) and (1.6), 
Y@)= DL bang oe dy) A+++ A W* ay). ts 
H<<h 


The expression for (1.138) in terms of the dv’s can be computed by substituting 
(1.12) into (1.13) and collceting cocficicnts. The answer, of course, will look 


11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 433 
just like it did before. If 
e@ = DY aa, de A+++ A de’, 


then the a’s are given by 
Ji 


oy oy 
on Akh 
is iig = (b5,,...44,° ©) det | : ean (1.14) 
AS <q ey" aye 
oxig stg 


Again, we emphasize that there is no need to remember a complicated 
looking formula like (1.14); Eqs. (1.5), (1.6), and (1.12) (and of course the rules 
for A-multiplication) are sufficient. In many cases, it is much more convenient 
to do the substitutions directly than to use (1.14). 


2. ORIENTED MANIFOLDS AND 
THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 


Let Af be an z-dimensional manifold. Let (U, «) and (W, 8) be two charts on M 
with coordinates z!,...,2” and y!,...,y¥%. Let w be an exterior differential 
form of degree n, Then we can write 


w=adz'A---Adz® on U 
and 

w=bdy' A--- A dy* on W, 
where the functions a and 6 are related on U nN W by (1.11), whieh, in this case 
(q = 7), becomes 


dx} 0x1 
a=bdet| : : 
ox" ox” 
or 
Ay (ate)} = bg (Bo a7 Nea(x})) det Jpoa-! (a(x)) 
or, finally, 


Gg(v) = be(Bo a '(v)) det Jaa) for vea(Unw). (2.1) 
If p is a density on M, then the transition laws for p,. are given by 
Palt) = pp(B oe a—'{v)) (det Jg..-1(v)]. (2.2) 


Note that (2.2) and (2.1) look almost the same; the difference is the absolute- 
value sign that oceurs in (2,2) but not in (2.1). In particular, if (U, a} and 
(W, 8) were such that det /g..-1 > 0, then (2.2) and (2.1) would agree for this 
pair of charts. 


A434 EXTERIOR CALCULUS 11.2 


This leads us to the following definition: An atlas @ of Jf is said to be 
oriented if for any pair of charts (U, «) and (W, 8) of @ we have 


det Jgean(a(x)) > 0 forall «Ee UN W. 


There is no guarantcc that there cxists an oriented atlas on a given manifold Af. 
In fact, it is not difficult to show that there docs not exist an oriented atlas on 
ecrtain manifolds. (An cxample of a manifold possessing no oriented atlas is 
the Mobius strip.) 

We say that a manifold AY is ortenéable if it has an oriented atlas. 

Let J? be an orientable manifold, and let @; and @2 be two oriented atlases. 
Wesay that @, and @, have the same orientation, and write @, 7 @»y, if @, U 2 
is again an oriented atlas. To say that @, 3 @2 means that for any (U, a) € @, 
and any (IW, 8) © @, we have 


det Jg.o1(v) > 0 on a(UNW). 


It is clear that 7 1s an equivalence relation. An equivalence class of oriented 
atlases is called an orientation of MJ. An orientable manifold, together with a 
choice of orientation, will be called an oriented manifold. We shall denote an 
oriented manifold by M. That is, M is a manifold Af together with a choice 
of orientation. Thus an oriented one-dimensional manifold has a preferred 
direction at each point (Fig. 11.1); an oriented two-dimensional manifold has a 
notion of clockwise versus counterclockwise direction (Fig. 11.2); and at any 
point of an oriented three-dimensional manifold we can distinguish between 
right- and left-handedness. 


Fig. 11.1 Fig. 11.2 


In general, let M be an oriented manifold, and let (U, «) be a chart of Jf 
with coordinates x',..., 2". We say that (U, «) isa positive chart if Jg.q-! > 0 
for any chart (W, 8) belonging to any oriented atlas defining (i.e., belonging to) 
the orientation. (It suffices to check this, of eourse, for all (W, 8) belonging to 
one fixed atlas defining the orientation.) Note that if U is connected, then if 
(U, a) is not positive, then the chart (U’, «'), where 


al (x) = <—y', v?, 2.2, 0" > if a(x) — <vl,v?,..., 0%, 


ig a positive chart. 

We shall say that (U, @) is a negative chart if det Jg.g-) < 0 for all (W, #) 
belonging to an atlas defining the orientation. (Thus, if U is connected, then 
(U, x) must be either positive or negative.) 


11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 435 


We now return to our initial observation comparing (2.1) with (2.2). 


Proposition 2.1. Let Ml be an oriented n-dimensional manifold. We can 
identify exterior forms of degree x with densities by sending the form w 
into the density p*, where for any positive chart (C’, a) with coordinates 
x',..., 2”, the function p® is determined by 


w = pi(e-)) de' A--- Adz on U. (2.3) 
Another way of writing (2.3) is 
w(d/dr',...,0/d2") = pld/dx',..., 8/ax”). (2.3') 


In other words, if @ = ade! A--- A de® on U, then p2() = ag. That p® 
is really a density follows from the fact that. (2.2) reduces to (2.1) for all pairs 
of charts belonging to a positive atlas. 

It is clear that this identification is additive, 


peter — p+ ps, (2.4) 
and thal for any function, 
pl? = fo”. (2.5) 


Furthermore, if w(z) = 0, then p° = Oatx. By the support of a differential 
form we mean, as usual, the closure of the set of x for which w(z) # 0. We say 
that an n-form w is locally absolutely integrable if the density o* is locally 
absolutely integrable. Note that to say that w is locally absolutely integrable 
means that for any chart ({', 2), with coordinates z!,..., 2” of some atlas @, if 


w= adr! A-+- A dx” on U, 


then the function a, = @¢ a7! is an absolutely integrable function on afU). 


Let T{Af) denote the space of absolutely integrable n-forms of compact support. 
It is clear that ['(/) is a vector space and that fw € TC.) if f is a (bounded) 
contented function and w ET(M). As a consequence of Proposition 2.1 and 
Theorem 3.1 of Chapter 10, we can state: 


Theorem 2.1. Let M be an oriented manifold. There exists a unique linear 
function f on PM) satisfying the following condition: If supp w Cc U, 
where (U, @) is a positive chart with coordinates z',...,2", and if w = 


adz’ A--- A dx”, then 
[ we [ ees (2.6) 
ally 
Observe that we can write 


fo = fe for all w GT(AL). (2.7) 


The recipe for computing fo is now very simple. We break w up into smal] 
pieces such that each piece lies in some U/. (We can ignore sets of content zero 


436 EXTERIOR CALCULUS 11.2 


in the process.) If supp @ C U/, and if (U, a) 1s @ positive chart, we express w as 
w=adri A--- A dx™. 


And if a is given as @ = a@a(z',..., 2”), we integrate the function @, over R*. 
The computations are automatic. Thus one point that has to be checked is that. 
the chart (U, a) is positive. If it is negative, then fw is given by —fag. 

Let M, be an oriented manifold of dimension gq, let ¢:My— Mo be a 
differentiable map, and let w & /\?(Af~e). Then for any contented compact set 
ACM, the form eag*(w) belongs to T(Af,), so we can consider its integral. 
This integral is sometimes denoted by See A); that is, we make the definition 


if a?n he cay". (2.8) 


If we regard ¢(A) as an “oriented g-dimensional surface” in M>, then we see 
that the elements of A%(/f,) are objects that we can integrate over such 
“surfaces”. (Of course, if g = 1, we say “curves”. 


Cio) 
Cla) 
7 es oo Fig. 11.3 Cb) Fig, 114 


Let us illustrate by some examples. Suppose that If; = R®, and let A C R? 
be the interval a < t <b. Let 2!,..., 2" be the eoordinates of R*, and let 
w = aldx' +---++ a"dx”. We regard R' as an oriented manifold on which 
the identity chart is positive (and its coordinate is t). If C: R! > R* is a diffor- 
entiable curve (Fig. 11.3), then 


b 
= | (C’(t), w) at. (2.9) 


From this last expression we see that C does not have to be differentiable every- 
where in order for Jeaaey @ to make sense. In faet, if C is differentiable 
everywhere on R except at a finite number of points, and if C’{2) is always 
bounded (when regarded as an element of R”), then the function (C’(-), w) is 
defined everywhere except for a set of content zero and is bounded. Thus 


11.2 THE INTEGRATION OF EXTERIOR DIFFERENTIAL FORMS 437 


C*(@) is a contented density and (2.9) still makes sense. Now the curve can 
have corners. (See Fig. 11.4.) 
It should be observed that if e = df (and if C is continuous), then 


df= aged) = f° (foc 
hess = Cija.b)) f ea 


= f(C)) — f(C@)). (2.10) 


In this case the integral depends not on the particular curve C’ but on the end- 
points. Jn general, fe w depends on the curve C. We will obtain conditions for 
it to be independent of C in Section 5. 

In the next example let IM. = R® and M, = UCR? where (u,») are 
Euclidean coordinates on R? and 2, y, z are Euclidean coordinates on R*. Let 


wo= Pdz A dyt+Qdx A dz— Rady A dz 


be an element of A?(R*). If o: U + R® is given by the functions x(x, v), 
y(t, v), and 2(u, v), then for A CU, 


[. w= fescta = feae(Pae dy +Qde A de-+ Ray A de 
pay 


- ; ot oe) : ey 
= {le ¢) (22% Ou Ov + Qe) Ou dv du Ov 


‘We conclude this section with another look at the volume density of Riemann 
metrics, this time for an oriented manifold. If M is an oriented manifold with a 
Riemann metric, then the volume density @ corresponds to an v-form 2. By 
our rule for this correspondence, if (U, a) is a positive chart with coordinates 
gi,..., 2", then 

Q=adzr' A+++ A dz", 


where, by (4.1) of Chapter 10, a(x) = jdet (g;;)|!/? is the volume in 7',(A/) of 
the parallelepiped spanned by 

a ts) 

jor spe @). 


Let e)(%),..., n(x) be an orthonormal basis of T,(M) (relative to the sealar 
produet given by the Riemann metric). Then 


9% 3 
ldet (g:,)|1/? = idet ie F «| 


where A = (0/dz', e;) is the matrix of the linear transformation carrying 
e; — /dx’. If w'(x),..., w(x) is the dual basis of the e’s, then 


w(x) A--- A w(x) = det A dri(x) A-++ A dex). 


= |det Al, 


438 EXTERIOR CALCULUS 11.3 


Now w'(z),..., w"(x) can be any orthonormal basis of T*(Af). [T*(M) has a 
scalar product, since it is the dual space of the scalar product space 7.(M).] 
We thus get the following result: If w',..., w” are linear differential forms such 
that for each x € M, w'(zx),..., w(x) is an orthonormal basis of T*(AZ), then 


Q2= tol A--- Aw 
We can write 
2= al A+++ Aw (2.11) 


if we know that w' A--- A w” is a positive multiple of dx! A +--+ A dz”. 
Can we always find such forms w',...,w” on U'? The answer is “yes”: we can 
do it by applying the orthonormalization procedure to dz!,..., dx”. That is, 
we set 


ie. dz* where | dx"||(x) = |jdz"(2)|| > 0 
~ Yael’ is a C*-function on U, 
ae _ds? — (dz?,o!)w! 


The matrix which relates the dr’s to the w’s is composed of C”-functions, so that. 
the we A'(U). Furthermore, it is a triangular matrix with positive entrics 
on the diagonal, so its determinant is positive. We have thus constructed the 
desired forms w!,...,w", so (2.11) holds. Jor instance, it follows from 
Eg. (9.10), Chapter 9, that @@, sin @ dg form an orthonormal basis for T,(S7) at 
all x € S? (except the north and south poles). lf we choose the orientation on S? 
so that @, ¢ form a positive chart, then the volume form is given by 


Q= sin ddd A dy. 


3. THE OPERATOR ad 


With every function f we have associated a linear differential form df. We can 
thus regard dasa map from A\°(M) to A\1(A/). As such, it is linear and satisfies 


d(fife) = fo dfi + fr dfe. 


We now seek to define a d: /\*(A7) — A*t!(a7) for k > 0 as well. We shall 
require that d be linear and satisfy some tdentity with regard to multiplication, 
generalizing the above formula for d(fif2). The condition we will impose is that. 


d(w A We) = dos, A @2+ {(—1)¥ oa A dwe 


if @, isa form of degree p. The factor (—1)” accounts for the anticommutativity 
of A. The reader should check that d is consistent with this law, at least to the 
extent that d(w, A we) = (—1)?? d(w2 A @) if w, is of degree p and wg is of 
degree q. 

We are going to impose one further condition on d@ which will uniquely 
determine it. This condition (which lies at the heart of the matter) requires 


11.3 THE OPERATOR d 489 


Fig. 11.5 


some intraduction. Let f be a differentiable function, and let C: J —~ M bea 
differentiable curve. Yor any points ¢,b € J, the fundamental theorem of 
the calculus implies that 


: b 
sc) — seta) = f SQ a= f crap 


We can regard b and @ (with + signs attached) as the “oriented boundary” 
of the interval [@, 6). Let us make the convention that “integrating” an element 
of /\°(p) is just evaluating the funetion at the point p. As such, the equation 
above says that the integral of the “pullback” of f over the “boundary”, that, is, 
f(b) — f(a), equals the integral of the “pullback” of df over [a, b]. In some sense, 
we would like to be able to say that if is a form of degree k, then the integral 
of the “pullback” of w over the k-dimensional boundary” of a (& + 1)-dimen- 
sional region is ecyual to the integral of the pullback of dw over the (& + 1)- 
dimensional region. Without trying to make this requirement precise, let us see 
what it says for the case where k = 1 and the region is a triangle in the plane. 
Let ¢ be a smooth map of some neighborhood of the triangle A C R? into Jf, 
and let the vertices of A be mapped by ¢ into x, y, and z (see Fig. 11.5). The 
boundary of A consists of three curves (segments) (), Co, and Cg (with the 
proper orientations). Let w be a linear differential form on Jf. We would then 


expect that 
fe dw = [Ciete + [eiere + [cave 


If w = df, then the three integrals on the right become (by the fundamental 
theorem of the caleulus) f(y) — f(x) + fiz) — fly) + f(x) — f(z) = 0. Thus 
Je* d(df) = 0. Since the triangle was arbitrary, we expect that 


d(df) = 0. 
We now assert: 


Theorem 3.1. There exists a unique linear map d: A*(D) — A*T(M) 
such that on A° it coincides with the old d and such that 


Aw, A we) = dw A w2+ (—1)?, Ado, if wm EAM) (3.1) 


and 
d(dfy=0 if fe AX). (3.2) 


440 EXTERIOR CALCULUS L153 


Proof. We first establish the uniqueness of d. To do this we observe that (3.1) 
implies that ¢ is leca?, in the sense that if w = w’ on some open set U, then 
dw = dw’ on U. In fact, let W be an open set with Wc U, and let ¢ be a C*- 
funetion such that ¢(x) = 1 forz € W and supp¢c U. Then gw = gu’ every- 
where on AY, and thus digw} = d(yw’). But, by (3.1), d(gw} = eda + de A 
w = dw on W, since ¢ = 1 and dg = O there. Thus dw = dw’ on W. Since W 
ean be arbitrary, we conclude that dw = dw’ on UU. 

Let (1/,a) be a chart with coordinates x1,...,2". Every w & A*(M) 
can be written as 


w= DL ay, de Ass) Ade on U. 


ys Say 
Now [by induction on #, using (3.1) and (3.2)] d(dz® A ~-- A de®) = 0. 
Thus (3.1} implies that 


do = Vida, ., A de® A+++ A du™ on U, (3.3) 


Equation (3.3) gives a local formula for ¢d. It also shows that dis unique. In 
fact, we have shown that there is at most one operator d on any open subset, 
OCM mapping A*(0) = A**1(0) and satisfying the hypotheses of the 
theorem (for 0). On the set On U it must be given by (3.3). 

We now claim that in order to establish the existence of d, it suffices to show 
that, the d given by (3.3) [in any chart (U/, «)] satisfies the requirement of the 
theorem on /A\*(U). In fact, suppose we have shown this to be so. Let @ be 
an atlas of AY, and for each chart ({/, «) € @ define the operator d,: A®(U) > 
/Ft!(U) by (3.3). We would like to set dw = d,w on U. Vor this to be eon- 
sistent, we must show that d,@ = dgw on UM W if (W, 8) is some other chart. 
But both d, and dg satisfy the hypotheses of the theorem on U m W, and they 
must therefore coincide there. 

Thus to prove the theorem, it suffices to check that the operator d, defined 
by (3.3), fulfills our requirements as a map of A\*(U) — AFTI(L). It is ob- 
viously linear. To check (3.2), we observe that 


—- > 4 oi 
df = eae’, 
s0 


ig 


> Ga af ) aa A de! 


ai Oxt = xt Bat 


= 0 


by the equality of mixed partials. 
Now we turn to (3.1). Sinee both sides of (3.1) are linear in @; and we 
separately, it suffices to check (3.1) for w) = adx® A+-- A de® and w= 


11.3 THE OPERATOR d 441 


bdzt A--- A det, Now wy A w= abdet A-+- A dee A del Aw-- A dria 
and d(ab) = 5 da + a db; therefore, 
d(w, A w2)=bdaAdzit A--- A de® A del A+-- A date 
+adb A det A --+ A dz A dx™ A+) A des, 
while 
dw, A wo = (da A dt™ A-++ A de”) A (6 dz A -++ A dat) 
=bda Adz" A+++ A da A de® A+++ A dat 
and 
wy, A dwg = (ade A--- A de®) A (db A dz™® A+++ A dx’) 
= (-1)’adb A de® A+. A de? A dz A+++ A dete, 
so we see that (3.1) holds. This proves the theorem. 0 


We can draw a number of important corollaries from Eq. (3.3). 
First of all, it follows immediately that for # € A*(Af), for any k, we have 


didw) = 90. (3.4) 
(Remember we merely assumed it for k = 0.) 
Secondly, let ¢: M, — My, be a differentiable map. Then for w € A*(AF2) 


we have 
dy*w = o* dw. (3.5) 


To check (8.5), it suffices to verify it for any pair of compatible charts. 
But if 21, ..., 2” are coordinates on Af, and, locally, 
© = © ayy. A+++ A ax', 
we have 
de*w = dD e*(ai,.ze* de A +--+ A 9% dx") 
= >B de*(as,....1,) A dp*zct Arron dg*xt* 
= 2 ¢* dass, A dp*z" A+++ A do*x* 
= 9*() dai. A det A--- A de) 
= g* du. 
In particular, if X is a vector field on M4, we conclude that 


Dy dw = d(Dxe). (3.6) 


EXERCISES 


8.1 Compute d of the following differential forms. 
a) ¥ = Pi (—1) te, dri A+++ A dt A atigi Ate A ata 
b) r7"y, where Y is asin (a) andr = {x?+--- 22} 1/2 
ec) Lip: dg 
qd) sin (2? + y? + 22)(n de + y dy z dz) 


442 EXTERIOR CALCULUS Il} 


Let V be a vector space equipped with a nonsingular bilinear form and an orientation 
Then we can define the *-operator as in Chapter 7. Since we identify the tangent spars 
T,{V} with V for any « € V, we can consider the *-operator as mapping A\*(V) 
/\t7*(V}.. For instance, in R?, with the rectangular coordinates (x, y), we have 


adx = dy, dy = —dx, 
and so on, 
3.2 Show that 
ay, of 
dx df = (549 dx A dy 


3.3) Obtain a similar expression fur d@ « din R” with ifs aswel sealar product. (Rem: 
that 


for any funetion f on R?. 


edzl A.) A de® — def TEA. + A det 
anil, more generally, 
adxl Awe) A de® = +dr® Ace A drint, 

where (i1,.-., t%,J1,--- )Je—«) is 8 permutation of (1,..., 2) and the + is the sign 
of the permutation.} 

3.4 Let z, y, z, £ be coordinates on R*. Introduce a sealar product on the tangem! 
space at each point so that 

(dx, dx} = (dy, dy) = (dz, dz) = }, 
(dx, dy} = (dx, dz) = (dz, dt) = (dy, dz) = (dy, dt} = (dz, dé) = 0, 
and 
fc dt, ¢ di) = —t, 


where ¢ is a positive constant. Let the two-form w be given by 
w = ci, dx A dt4- Body A dt4- Bj dz A ad 
4+ Bi dy A dz4- Bodz A dz d- Bs dz A dy. 
Let the three-form ¥ be given by 
Y=pdr A dy A dz — (J, dy A dz4-Jedz A dx 4-J3dx A dy) A at. 


Write the equations 
dw = (0), deo = 477 


as equations involving the various coefficients and their partial derivatives. 


4. STOKES’ THEOREM 


In this section we shall prove a theorem which will be a far-reaching generaliza 
tion of the fundamental theorem of the calculus of one variable. It shoul, 
perhaps, be called the fundamental theorem of the calculus of several variables 
We first make some definitions. 

Let D be a domain with regular boundary in a manifold AY. We recall 
(page 419) that each point of Mf lies ina chart (U, «) which is one of three types 


11.4 STOKES’ THEOKEM 443 


Let (UV, a) and (W, 8) be two charts of M of type (iii). Then, as on page 420, 
the matrix of Jgeq—! is given by 


ez} oxen 
ay" Sate tate pee ay" : 
axl Ox 
ay” 
: ? ° ° Ou 
and so 
ay" oy" 
dyl ax! 
oy” : . 
det Jpea-! = Son X det : : 
ay"! ay"! 
dxtl ax2—1 
ay” 
= pya X det JistepyecateDy (4.1) 


Furthermore, y"{2,..., 2") > Oif x” > 0, since af NW) fv: 9" > Oh = 
a(U OW nint D). Thus dy"/az” > 0 at a boundary point where x, = 0. 

Now suppose that M is an oriented manifold, and let DC M be a domain 
with regular boundary. We shall make dD into an oriented manifold. We say 
that an atlas @ is adjusted if each (U, a} € @ is of type (i), (ii), or (iii) and, in 
addition, if each chart of @ is positive. 

If dim M > 1, we can always find an adjusted atlas. In fact, by choosing 
the U/ connected, we find that every (U’, a) is either positive ar negative. If 
(U/, aw) is negative, we replace it by (U, a’), where 22, = —2x}. 

If dim M = 1, then @D consists of a discrete set of points (which we ean 
regard as a “zero-dimensional manifold”). Each x € dD lies in a chart of type 
(iii) which is either positive or negative. We assign a plus sign to 2 if any chart 
(and hence all! constricted charts) of type (iii) is negative. We assign a minus sign 
to z if its charts of type (iii) are positive. In this way we “orient” dD, as shown 
in Fig: 11.6. 


ee 
See ae 


Fig. 11.6 
If dim 4 > 1, we choose an adjusted (oriented) atlas on M. It then follows 
from (4.1) and the fact that dy”/ax” > 0 that 
det Jig teDjoce tepy? > 0. 


This shows that (U [ aD, a [ @D) is an oriented atlas on 09. We thus get an 
orientation on dD, This is not quite the orientation we want on dD. For reasons 
that will soon become apparent, we choose the orientation on 8D so that 


444 EXTERIOR CALCULUS 11.4 


(Uf 8D, a! aD) has the same sign as (--1)". That is, (U [ AD, af aD) is a 
positive chart if 2 is even, and we take the orientation opposite to that deter- 
mined by (U[ éD, af 8D) if x is odd. We ean now state our main theorem. 


Theorem 4.1 (Stekes’ theorem). Let M be an »-dimensional oriented mani- 
fold, and let D C AZ be a domain with regular boundary. Let 6D denote 
the boundary of D regarded as an oriented manifold. Then for any 
w € ("71 (Mf) with compact support we have 


f r Ye = ia des, (4.2) 


where, as usual, ¢ is the jection of boundary DB into M. 


Proof. For » = 1 this is just the fundamental theorem of the calculus. 

For 2 > 1 our proof is almost exactly the same as the proof of Theorem 6.1 
of Chapter 10. Choose an adjusted atlas @ and a partition of unity {g,;} sub- 
ordinate to @. Since » has compact support, we can write 


a = ya 5; 
where the sum is finite. Since both sides of (4.2) are linear, it suffices to verify 
(4.2) for each of the summands g,@. Since supp gja C U, where (U, a), we 
must check the three possibilities: (2/, a) satisfies (i), (ii}, or (iii). 
Tf (U, «) satisfies G), e*@ = O, since suppwN dD = G, and 


[aw fi ep dw = 0, 


since Dh supp » = @. Thus both sides of (4.2) vanish. 

If (U, a) satisfies (ii), the lefi-hand side of (4.2) vanishes. We must show 
that the same holds for the right-hand side. Let z!,..., 2" be the coordinates 
on (€/, «), and write 


gjo = a, dv? A+++ A de™ + agdz! A de®? Avs) A de™+--- 
t+ an,de! A--- A det, 
Then 


71 4a; nm 
dg; = >> (—1) tide Acs? A ax”, 


and thus 


f w= Een fe. 


Rn dxt 


Since g;# has compact support, the functions a; have compact support, and 
we can replace the integral over R* by the integral over []/®r, where R = 
<R,..., R> and & is chosen so large that supp a; C (Re. But writing the 
multiple integral as un integral, we get 


0a: 
fon, t= (ae Gere epee nes eee rt 
since a,(..., #,...) = af..., —R,.. j= 0. 


11.4 STOKES’ THEOREM 445 


Fig. 11.7 


We now examine f dg,w in case (iii). The argument proceeds exactly as 
before, except that we must compute faattany da;/dx; instead of fort: (See 
Fig. 11.7.) 


We can now replace the region of integration by a rectangle of the form 


SPR? yoy for large R. If i <n, f da,/ax; = 0 as before. lid = n, we get 


Ga, 
Faas ar, — i= an(-, re ae) 0), 


s-1 | Oa, 
f ata = > (—1} xf 2 = crf an, of eee a | 0). 
Now since ” = @ on U 9 8D, we sce that e* dz” = 0. Thus 

t*@ = (t*a,)(e* de!) A+++ A (t* de®}, 


so that 


or if (by abuse of notation} we regard x!,...,2"~' as the coordinates of 
(US aD, a! aD), we get 


ttm = al-,+,...,°,0}dr' Aves A de? 
In view of the choice we made for the orientation of ¢D, we conclude that 


i ie arate sob cai, 


This completes the proof of the theorem. 0 


Theorem 4.1, like the divergence theorem, is not sufficiently broad for us to 
apply to more general domains. For this purpose, we wil] again use the notion 
of a domain with almost regular boundary. 

We have already seen that the set of x € dD having a neighborhood of type 
(iii) forms a differentiable manifold. (Recall that these points need not exhaust 
all of 2D). Similarly, if M is an oriented manifold, then this collection of points 
becomes an oriented manifold (with (—1)" times the induced orientation, as 
before}. By abuse of language we shall denote this oriented manifold by dD. 
Thus @D is an oriented manifold which, as a set, is not 0D but only the “regular” 
points of éD, that is, the points of aD. 


446 EXTERIOR CALCULUS 114 


Theorem 4.2 (Stokes’ theorem). Let M be an z-dimensional oriented 
manifold, and let DC Af be a domain with almost regular boundary. 
Let aD be as above, and let ¢ be the injection of 8D — M. Then for any 
w € /\"7'(M) with compact support we have 


ie Ww = if dur. (4,2) 


Proof. The proof proceeds as before. We choose an adjusted atlas and a par- 
tition of unity {g;} subordinate to the atlas. We write w = Dg, and now 
have four cases to consider. The first three cases have been handled already. 
The new ease is where 


gio = >. a; dr’ Av AGP Ao A da", 
i 


where the ~ indicates that dz/ is to be omitted, has its support contained in U, 
where ({/, a) is a chart of type (iv). By linearity, it suffices to verify (4.2) for 
each summand on the right, i.e., for 


ajdx' A+++ A dai A+++ A da® 


Now c*(a;dxr1 A--- A di A--> A dx") = unless j > k, since dx” 
vanishes on the piece of dD 9 U whose image under «@ lies in Ht. If 7 < p, 
then all these dx” occur, and thus t*(a;de1 A +--+ A dai A--+ A de") = 0. 
If j > p, then e*(a; dz! A+++ A det A de") vanish everywhere except on the 
portion of 8D which maps under a onto Hj. 

On the other hand, 


dads A= AAS AS (—1)'1 52 ae! a 


We can evaluate the integral f p by integrating over the rectangle 
<B,.0.,R,-.., RD 
Ryo... RK 0.24,0,0> 
(where the —F’s extend through the (k — 1)th position). Integrating first 
with regard to 2’, we obtain 


f dajdtiA--- Ade A-- Ad) = (—0?f a5. 
D Hs 


On the other hand, the orientation on H} is such that this integral has the 
sign necessary to make (4.2) hold. This proves Theorem 4.2. 0 


As before, we can apply Theorems 4.1 and 4.2 to still more genera] domains 
by using a limit argument. For instance, Theorem 4.2, as stated, does not apply 
to the domain D in Fig. 11.8, because the curves C, and C2 are tangent at P. 
It docs apply, however, to the approximating domain obtained by “breaking 
off a little piece” (Fig. 11.9), and it is clear that the values of both sides of (4.2) 
for D’ are close to those for D. We thus obtain (4.2) for D by passing to the 
limit. As before, we will not state a more general theorem covering these cases. 
It will be clear in each instance how to apply a limit argument. 


114 STOKES’ THEOREM 447 


Fig. 11.8 Fig. 11.9 


Since the statement and proof of Stokes’ theorem are so close to those of the 
divergence theorem, the reader might suspect that one implies the other. On 
an oriented manifold, the divergence theorem is, indeed, a corollary of Stokes’ 
theorem. To see this, let @ be an element of A"(/f) corresponding to the 
density p. If X is a vector field, then the n-form DxQ clearly corresponds to the 
density Dxyp = div <X,p>. Anticipating some notation that we shali intro- 
duce in Section 6, let X _} 0 be the (n — 1)-form defined by 


X Sace",..., E*) = (—1)*7 102", ..., 2°74, X). 
In terms of coordinates, if 9 = adz! A--- A dz”, then 
X JQ = a[X' dz? A--- A da” ~ X?dz' A dz? A--- A dx” 


pene $f (-1)771X* dz! A--- A de® 4). 
Note that 


d(X 10) = (= 2%) ae DA A dst, 


which is exactly the n-form DxQ, since it corresponds to the density Dep = 
div <X,p>. Thus, by Stokes’ theorem, 


[, “X19 = ff dx 10) = f div < X,p>. 


We must compare X 19 with the density px on 8D. By (2.2) they agree on 
everything up to sign. To check that the signs agree, it suffices to compare 


a a@\ (fa a 
ex \aqt?* °°? agent) — P\ agi?" * Geant’ 


G) 8 
(xs0(,gie--g2s)) 
atany2eaD. Now 


o*(X $0) = (—1)" UX" dz A+ A det? 


with 


and, according to our convention, z!,..., 2"! is a positive or negative coordi- 
nate system according to the sign of (—1)". Thus the two coincide if and only 
if X” is negative, that: is, 


i W(X 19) = f, Expx- 


448 EXTERIOR CALCULUS 11.4 


EXERCISES 


4.1 Compute the following surface integrals both directly and by using Stokes’ 
theorem. Let {[] denote the unit cube, and let B be the unit ball in R3, 
ad fagrdy A det+ydz A detzdz A dy 
b) Sapa? dy A dz 
eo) fan coszde A dy 
qd) fapzdy A dz, where 
U= fleyaia Soy 2022027 +y¥4+2<o 2 


4.2 Letw = yedr+ xdy + dz. Let ¥ be the unit cirele in the plane oriented in the 
counterclockwise direction. Compute fy a. Let 


1= {(@,y,2):2 = 0,274 9? < 1), 
Ao = {(z,y,2:2 = b—-at—y?,e%?@+y? <1}. 


Orient the surfaces 4, and .4zs0 thatdAr = 9A2 = ¥. Verify that fa, dw = Sa, dw = 
fw by computing the integrals. 
4.3 Let S! be the circle and define w = (1/2) dé, where @ is the angular coordinate. 

a) Let go: S! — S! be a differentiable map. Show that fy*w is an integer. This 
integer is called the degree of » and is denoted by deg ¢. 

b) Let y be a collection of maps (one for each 2) which depends differentiably 
on t. Show that deg yo = deg ¢1. 

c) Let us regard §! as the unit cirele in the complex numbers, Let f be some 
function on the complex numbers, and suppose that f(z) ~ 0 for |z| =r. Define 
¢rs by setting o,p(e®) = flre®)/|ftre™)|. Suppose f(z) = z*. Compute deg ¢y,s 
forr ¥ 0. 

ad) Let f be a polynomial of degree n > 1. Thus 


Sz) = aye" + an_12"1 ++ +++ 40, 


where @, ~ 0. Show that there is at least one complex number zo at which f(zo) = 0. 
{Zint: Suppose the contrary. Then ¢;.a/,)s is defined for allO <r < % and deg 
Gr, fos = const, by (b). Mvaluate limo and lim,—, of this expression.] 


lI 


Let X be a vector field clefined in some neighborhood U of the origin in E?, and suppose 
that X(0) = 0 and that N(z) = 0 for z ~ 0. Thus ¥ vanishes only at the origin. 
Define the map 9: 8! = S! by 
A (e, = X(re"*) . 
, HX (re*®) 
This map is defined for sufficiently small r. By Kxercise 4.3(b) the degree of this map 
does not depend on r. This degree is called the index of the vector field X at the origin. 


4,4 Compute the index of 
a 3 a ) 


F) a 
a) aa tys b) 25 — ¥5y’ ©) V5 * 5, 


d) Construct a vector field with index 2. 
e) Show that the index of —X is the same as the index of XN for any vector field X. 


11.8 SOME ILLUSTRATIONS OF STOKES’ THEOREM 449 


4.5 Let X be a vector field on an oriented two-dimensional manifold, and suppose 
that X(p) = 0 for some p © M and that X does not vanish at any other point in a 
small neighborhood of p. By choosing an oriented chart mapping @ into zero, we pet a 
vector field on E? vanishing at the origin. Show that the index of this vector field does 
not depend on the choice of charts. We can thus define the index of X at p. 

4.6 a) On the sphere S* let X be a vector field which is tangent to the meridian 
circles everywhere and vanishes only at the north and south poles. What is its 
index at each pole? 

b) Let ¥ be a vector field which is tangent to the circles of latitude everywhere 
and vanishes only at the north and south poles. What is its index at each pole? 


5. SOME ILLUSTRATIONS OF STOKES’ THEOREM 
As a simple but important corollary of Theorem 4.2, we state: 


Theorem 5.1. Let ¢: M,— Mz, be a differentiable map of the oriented 
k-dimensional manifold My, into the n-dimensional manifold Afz. Let w 
be a form of degree k — 1 on Afs, and let 

DCM, be a domain with almost regular q 
boundary on M@,. Then we have 


[ e yew = i eo" (de). (5,1) ° 


Equation (5.1) follows dircetly from (4.2) Fig. 11.10 
and from the fact that o*d — do*. 

We can regard the right-hand side of (5.1) as the integral of dw over the 
“oriented k-dimensional hypersurfaces” »(D). Equation (5.1) says that this 
integral is equal to the integral of @ over the (& — 1)-dimensional hypersur- 
face y(dD). 


Fig. 11.11 


We now give a simple application of Theorem 5.1. Let Co: [0, 1) -> Af and 
Cy:[0,1] > M be two differentiable curves with Co(0}) = C)(0) = p and 
Co(1) = Ci(1) = g. See Fig. 11.10.) We say that Co and C; are (differentiably) 
homotopic if there exists a differentiable map ¢ of a neighborhood of the unit 
square [0, 1] x [0,1] CR? into MW such that g(t, 0) = Co(d), oft, 1) = Cr), 
¢(0,s) = p, and g{1, s) = g. (See Fig. 11.11.) For each value of s we get the 
curve C, given by C,;(¢) = g(t, s}. We think of ¢ as providing a differentiable 
“deformation” of the curve Cg into the curve C). 


450 EXTERIOR CALCULUS 11.5 


Proposition 5.1, Let Co and C, be differentiably homotopic curves, and 
let @ be a linear differential form on Af with da = 0. Then 
e= W. (5.2) 
Cy Cy 
Proof. In fact, 
* * 
ngs ©? = fags 0 d= 


But fou is the sum of the four terms corresponding to the four sides of the square. 
The two vertical sides (= 0 and é= 1) contribute nothing, since ¢ maps 
these curves into points. The top gives —fe, (because of the counterclockwise 
orientation), and the bottom gives fe, Thus Jogo — Jc, w = 0, proving the 
proposition. 0 


It is easy to see that the proposition extends without difficulty to piecewise 
differentiable curves and piecewise differentiable homotopies. Let us say that 
two piecewise differentiable curves, Cy and Ci, are (piecewise differentiably) 
homotopie if there is continuous map ¢ of (0, 1] < [0, 1] — 4% such that 

i) ¢(0, 8) = ?P; g(1, 8) = | 

li) v(t, 0) = Cold), oft, 1) = C1); 

iii) there are a finite number of points tg < i) < +++ < t, such that ¢ 
coincides with the restriction of a differentiable map defined in some 
neighborhood of each rectangle [é;, é:4.1] X [0, 1]. (See Fig. 11.12.) 

To verify that Proposition 5.1 holds for the case of piecewise differentiable 
homotopies, we apply Stokes’ theorem to each rectangle and observe that the 
contribution of the interior vertical lines cancel one another. 

We say that a manifold Af is conneeted if every pair of points can be joined 
by a (piecewise differentiable) curve. Thus R®, for example, is connected. We 
say that 4 is semply connected if all (piecewise differentiable) curves joining the 
same two points are (piecewise differentiably) homotopic. (Note that the circle, 
S! is not simply connected.) Let us verify that R” is simply connected. If Co 
and C, are two curves, let ¢: (0, 1} x [0, 1] — R” be given by 


elf, s) — olf) + (CL — 9), (). 
It is clear that ¢ has all the desired properties. 


NOOO) ne 


Proposition 5.2. Let Af be a connected and simply connected manifold, 
and let O&M. Let w€A}(M) satisfy d» = 0. For any x & M let 
f(x) = Jew, where € is some piecewise differentiable curve joining O to z. 
The function f is well defined and differentiable, and df = w. 


11.5 SOME ILLUSTRATIONS OF STOKES’ THEOREM 451 


Proof. It follows from Proposition 5.1 that f is well defined. If Cy and Cy are 
two curves joining O to 2, then they are homotopic, and so fo,w = fo, w. 
It is clear that f is continuous, since 


fa) — fy) = f a, 


where D is any curve joining y to x (Fig. 11.13). 
To check that f is differentiable, let (U, «} be a chart about x with coordi~- 
nates <z!,...,2">. Then 


felt th fel. = fe, 


where C is any curve joining p to g, where a(p) = (a!,..., 24,..., 27), and 
where a(q) = (a',...,2'+h,...,2"). We can take € to be the curve given 
by 

ao C() = (z',..., 2° +ht,..., 2"). 


If w = a; de' +--+ +4, de", then 


1 h ; 
Leaf hardt =f astal,...,2'-+5,...,2") ds 
(See Fig. 11.14.) Thus 


lim plat, oat Rta") — f(et,..., 2") = af, 

h-30 
that is, f/axt = a*. This shows that f is differentiable and that df = «, proving 
the proposition. 0 


Fig. 11.13 Fig. 11.14 


We have thus established that every o € A(R") with dw = 0 is of the 
form @f. More generally, it can be established that if 2 A*(R”) satisfies 
dQ = 0, then 2 = dw for some w € A*(R*). 


*This is not true for an arbitrary manifold. For instance, every w © A\1(S') 
satisfies dw = 0. Yet the element of angle form (which is, unfortunately, 
denoted by dé) is not the d of any function. The faet that d? = 0 shows that if 
@ = dw, then d2 = 0. Thus the space d[/A*7'(M)] Cc A*(A) is a subspace of 
the space ker; d of elements in A*(M) satisfying d2 = 0. The quotient space 
ker, d/d[/A* 7] is denoted by #*(4) and is called the kth cohomology group of 
M. If M is compact, it can be shown that H* is finite-dimensional. It measures 
(roughly speaking) “how many” k-dimensional holes there are in J.« 


452 EXTERIOR CALCULUS 11.6 


6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 


Let Af be a differentiable manifold, and let. ¢ be a flow on Jf with infinitesimal 
generator X. Vor any w © /\*(M) we ean consider the expression 


gre = 0 
t 
It is not dificult (using local expressions) to verify that the limit as i 3 0 exists 
and is again an element. of A*(Jf), which we denote by Dyw. The purpose of 
this section is to provide an effective formula for computing Dyw. Tor this 
purpose, we first collect some properties of Dx. First of all, we have that it is 
linear: 


Dx{w, + @2) = Dxw, + Dw. (6.1) 
Secondly, we have 
gr (w, A we) — wy A we = (err) A (yrw2) — 01 A we 
= (gia) A (eiw2) — (erw1) A we 
+ (pfw1) Aw, — a1 A wz. 
Dividing by ¢ and passing to the limit, we see that 
Dx(w, A @2) = (Dxyo1) A we + @, A Dxwe. (6.2) 
Finally, since g¥ d = dg*, we have 
Dy dw = d(Dyw). (6.3) 
Actually, these three formulas suffice for the computation. If 
@= Day. det As) A dx¥, 
then 
Dew = Y Dx(ai,., dt" A+++ A dx®) by (6.1) 
= L [(Dxay,..4) de" Ave A de® + aya (Dx dr ) A+++ A dx 
+e tay, de A+++ A (Dy dz*)] by repeated use of (6.2) 
= F [(Dyaei....,) dx A+++ A de™ 4+ ai. d(Dxz) A+-- A de® 
Heer t Quy dr) A+++ A d(Dxz*)] by (6.3). 
Since this expression is rather cumbersome (the d( Dx") have to be expanded and 
the terms collected), we shall derive a simpler and more convenient expression 
for Dyw. In order to do this, we make an algebraic detour. 
Reeali that the operator @: /\*(M) — A**+1()4) is linear and satisfies the 
identity 
(i, Aw) = dw, A wo + (—l)*a, A dws (6.4) 


if w, © A*(Af). More generally, any (sequence of linear) maps # of 


MM) > AFM) 


11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 453 


satisfying the identity 


Oe; A We) = Oa, A wo + (—1)*a, A bw, (6.4’) 
and 
supp @w C supp wf (6.5) 


will be called an antiderivation of the algebra A(44). 

It follows from (6.5) that if w1 = we on an open set U, then @(@,) = @{w2) 
on U. Now about every z € Jf we can find a neighborhood UY and functions 
zl,...,2", so that w © A*(M) can be written as 


w= Das, de® A+++ A de® on U7, (6.6) 
Then by repeated use of (6.4) we have 


6(w) = Fo (O(a, A de® As A dx® + ai. O(de®) A +--+ A de® 
Here t (H1) ays det Ae A Ofde*)]. (6.7) 


We thus arrive at the important conclusion: 


Proposition 6.1. Any antiderivation 6: A*(Af) — A®1(M),k =0,...,7 
ig uniquely determined by its action on A°(Af) and A}l{/4). That is, if 
6,(w) = 82{w) for all woe AC(M) and Al(Af), then 6,(Q) = 62(2) for 
2 <€ A*(Mf) for any k. 


Now suppose we are given maps 
6: ASE) > A(T) aand ~—#: AM) > A?) 
which satisfy (6.5) and (6.4’) where it, makes sense, that is, 


O(fifo) = Oita fief2) and — (fr) = O(f)Alw + fa(w). (6.8) 


Then any chart (UV,«) defines 6: A*(U) — Aft '(U) by (6.7). This 
gives an antiderivation 6, on U, as can easily be checked by the use of the ar- 
gument on pp. 440-441. By the uniqueness argument, if (W, 8) is a second chart, 
the antiderivations @) and @w coincide on UMW. Therefore, Eq. (6.7) is 
consistent and yields a well-defined antiderivation on A(M). (Observe that we 
have just repeated about two-thirds of the proof of Theorem 3.1 for the more 
general context of any antiderivation.) 


t This condition is actually a consequence of (6.4). In fact, let U be an open set 
containing supp w. Since {7/, Af-supp aw} is an open covering of Jf, we can find a 
partition of unity subordinate to it. In particular, we can find a C®-function ¢ which 
is identically one on supp w and vanishes outside U. Then w = ¢w, so that 


Hw) = pw) = By) Aw + lw). 


Thus supp @(w) C suppw Usupp¢C U. Since U is an arbitrary neighborhood of 
supp @, we conclude that supp @(w) C supp w. 


454 EXTERIOR CALCULUS 11.6 


Also observe that in the above arguments, nothing changes if instead of 
6: AFL) — APTI) we have @: A*(A) — A®7'(Af). [We take this to 
mean 6(f) = 0 for f € A°(M).] In fact, the same argument works for 
6: AE = ues 
for any odd integer r. We can thus state: 
Proposition 6.2. Let ¢: A°(M) > AT(M) and 6: A1(M) > A7+(M) be 
linear maps satisfying (6.5) and (6.8), where r is odd. Then there exists one 


and only one way of extending ¢ to an antiderivation 9: A*(M) > A*®+7™(M) 
satisfying (6.4). 


As an application of this proposition, we will attach an antiderivation 
aX): A®(M) > AF (A) to every smooth veetor field X on Af. Sinee r = --1, 
for f € A°(M) we set 

HX) = 0. 


For w € A) (AL) we set 
aX)w = (X, w). (6.9) 


To verify (6.8) means to check that 
OX) (fa) = fOX)w, 


(X, fo) = f(X, w), 


that is, that, 


which is obvious. 

If f is a function and @ is an antidcrivation, we denote by {@ the map which 
sends w - > fé(w). It is easy to check that this is again an antiderivation. 

We ean assert the following as a consequence of the uniqueness theorem: 
Let X and Y be smooth vector fields, and let f and g be smooth functions. Then 


OFX -| g¥) = fo(X) + gA(Y). (6.10) 


By the proposition, it suffices to check (6.10) on allw € A (AZ). By (6.9), this 
is Just 
(fX -|- g¥, wo) = fX, w) | a, w), 
which is obvious. 
In particular, in a chart (U, 2), if 
et eee na a ge 

saa ar ae bie 

then 


W(X) = ¥ X's (2) 


To evaluate 6(2/ax'), we use (6.8) and the fact that 


Ys a\,;_ fo it ims, 
(2) 1— 0, 0(2:) as = ( ff fad 


11.6 THE LIE DERIVATIVE OF A DIFFERENTIAL FORM 455 


Thus, for example, 6(8/az") dx” A de® = 0 if neither p = ¢ nor g = 7, while 
6(8/dx") (de* A dx’) = dx’, 0(8/dr’)(dx? A dz*} = —dxv’, ete. 

Let. us call a (sequence of) map(s) D: A*®(M) > A***(AD), where s is even, 

a derivation if it satisfies (6.5) and 
Dio, A ea) = Dw, A we tow; A Dag. (6.11) 

Since s is even, this is consistent. The most important example is Dy 
where s = 0. Then (6.11) is just (6.2). 

All the previous arguments about cxistenee and uniqueness of extensions 
apply unchanged to derivations, as can casily be checked. We ean therefore 
asscrt: 

Proposition 6.3. Let D: A°(AL) — A°ts(az) and D: AlUY) > Altai), 

where g is even, be maps satisfying (6.5) and (6.8) (with @ replaced by D), 

Then there exists one and only one way of extending D to a derivation of 

AQH. 

We need one further algebraic fact. 

Proposition 6.4. Let @:: A\* 3 Att" and a2: A* 3 At: be antideriva- 

tions. Then 6,8 + @20:: Am — Aft: is a derivation. 

Proof. Since 7, and 72 are both odd, "1, + rz is even. Equation (6.5) obviously 
holds. To verify (6.4’), let #, € A®(Af). Then 
8182(@1 A w2) = 0;[82@1 A @2 + (—1)*wy A O22] 
= 06201 A we + (—1)*t%2620, A 102 
+ (—1)¥6:01 A Bowe + wr A 6192w2. 
Similarly, 
828;(@, A wo) = O26;a1 A we + (—1)*t yw) A fowe 
+ ( 1)*6.0; A Owe + Wy AN By A wy. 
Since r,; and rg are both odd, the middle terms canecl when we add. Hence we get 
(0:62 + 6261)(a1 A we) = (6:62 + O28r)ay A we + a; A (8162 + 6261 )w2. U 
As a first application of Proposition 6.3, we observe that 
6(X) © (CY) = —6(Y) ° A(X). (6.12) 

In fact, by Proposition 6.4, 6(X)6(Y) + @(¥)é@CX) is a derivation of degree 
—2, that is, it vanishes on A° and A\!. It must therefore vanish identically. 
We could, of course, directly verify (6.12) from the local description of #(X) 
and @(Y). 

As a more serious use of Proposition 6.4, consider @(X) od + do @(X), 
where X is a smooth vector field. Since d: A® — A*T! and 6(X): A® — Aé7, 
we conelude that @(X) °od+ do 6(X): A* — A*® We now assert the main 


formula of this section: 
Dy = €X)od+ deo 6(X). (6.13) 


456 EXTERIOR CALCULUS 11.6 


Since both sides of (6.13) are derivations, it suffiees to cheek (6.13) for 
functions and linear differential forms. If fe A(T), then 6(X)f = 0. Thus, 
by (6.9), Eq. (6.13) becomes 

Dxf = (X, df), 


which we know holds. Next we must verify (6.13) for w € At CIM). By (6.5), it 
suffices to verify (6.13) locally. If we write w= a, de} +---+ a4, de”, it 
suffices, by linearity, to verify (6.13) for each term a; dx*. Sinee both sides of 
(6.13) are derivations, we have 


Dy(a, de") = (Dya;) dx’ + a,( Dy ax’) 
and 


[@.X) d+ da(X)|(a, de") = [0(X) d + do(X)](a,) dx’ + a,[6(X) a + dO(X)] de’. 


Since we have verified (6.13) for functions, we only have to check (6.13) for dx’. 
Now 
Dy dx? => aDyxt 
and 
(A(X) @ + de(X)] de®’ = da(X) det = A(X, dx’) = ADxer’. 


This completes the proof of (6.13). 
In many circumstances it will he convenient to free the letter @ for other uses. 
We shall therefore occasionally adopt the notation 


X J@o= HX)w. 
The symbol J is called the zuterior product. X iw is the interior product, of 


the form w with the vector field X. If we A*, then X_Jwe A*!. Equa- 
tion (6.13) can then be rewritten as 


Dye — XN tdo+t dX Jw). (6,14) 


Let us see what (6.14) says in some special cases in terms of local coordinates. 
Ifw = ay dx! +-++ + aq de” and 


r a Q n a 
A= X TL Basse Aaa 
then 
dw = Y da, A de’. 
Henee 
X Jdw = >) (CX 4da,) A de’ — da; A X 4 de") 
2: jO8r j OA; 
= » (x zat — x 201 ax’) ; 
while 
X jJo= > a;X?, 
sO 


a 3{945\ 5 PANY 2% 
dX Jw) = EX (32 dx’ + Ya; an ae’. 


APP. I. “VECTOR ANALYSIS” 457 
Thus 


Oar OX’ ; 
pre ‘od! : z 
Dyw = 23 b> XxX a Qj ar dz’, 
which agrees with Eq. (7.12’) of Chapter 9. 
As a second illustration, le’ @Q = adr! A +--+ A dz”, where n = dim MM. 
Then d& = 0, so (6.14) reduces to 


Dxf = d(X 19). 
IfX = ¥ X9/dz'), then 


xuo=Ex(2 :) 4e 


= 7 oe re ” 
= Dax (3) aj (dx A A dx”) 
= (-1) aX de! A+) A det) A dz’ A+. A de", 


which is merely the formula introduced at the end of Section 4. Then 
DxQ = d(X 49) = (Sue ——— | ds! ++ A dx™. 


Since we can always locally identify a density with an #-form by identifying p 
with p, dz’ A+++ A dz” on (U, @), we obtain another proof of Proposition 4.2 
of Chapter 10. 


Appendix I. ““WECTOR ANALYSIS” 


We list here the relationships between notions introduced in this chapter and 
various coneepts found in books on “vector analysis”, although we shall have 
no occasion to use them. 

In oriented Euclidean three-space E*, there are a number of identifications 
we can make which give a special form to some of the operations we have 
introduced in this chapter. 

lirst of all, in E°, as in any Riemann space, we can (and shall) identify 
vector fields with linear differential forms. Thus for any function f we can 
regard df as a vector field. As such, it is called grad f. Thus, in E°, in terms of 
rectangular coordinates z, y, 2, 


af af af 
etal <i, "By dz 


where we have also identified vector fields on E* with E*-valued functions. 
Secondly, since E” is oriented, we ean, via the *-operator (acting on each T2), 
identify A?(E*) with A\1(E%). Reeall that * is given by 


#(dx A dy) = dz, x(dx A dz) = —dy, #(dy A dz) = dx. (7.1) 


458 EXTERIOR CALCULUS 


In particular, if #, = <P,Q,R> = Pde+Qdy+Radz and w.= 
<L,M,N> = Ldx+ Mdy+ N dz, we can introduce the so-called “vector 
product.” of wy with @s. It is defined by 


W, X Wo = #(W, A we) 
and js given [in view of (1.1)] by 
<P,Q,R> x <L,M,N> = <QN — BM, RL— PN,PM —QL>. 
Also we introduce the operator 
curl w = +du. 
Thus, ifw= <P,Q, R>, we have 


Consider an oriented surface in E°;i.¢., let o: S ~ E*. Let 2 be the volume 
form on § associated with the Riemann metric induced by ¢. By definition, if 
1, a = T;(S), then 

QE, Eg) = aV < vb, Pyk2, 2, 


where dV is the volume element of E* and x is the unit normal vector. Another 
way of writing this is to say that 


Q(&1, 2) = o(gyebi, vx é2), 


where ¢ = «rn when we regard nas a differential forrn. Now let @ be a form in 
E*, and suppose that »*® = fQ for some function f. Then 


f(x) = ©, #n)(e()). 


[e@= f fo= [Gana = f(a, n)0. 


Applying this to @ = dw, where w = Pdx +Qdy-| Rdz, we can rewrite 
Stokes’ theorem as 


[e= [Pari Qdy} Rae= f (curle,n)o, 


Thus 


where S is some surface spanning the closed curve C. 
If we apply the remark to the case @ = +*w and S = 2D, we obtain, since 


#* = id (for n = 3), 
fena=f ad * w 
DB 


aP  3Q, aR | 


which we write as div a; that is, 


Note that 


diva = d* w. 


APP. I]. DIFFERENTIAL GEOMETRY OF SURFACES IN E® 459 


(It is in fact div {w, dV}, where dV’ is the volume element and we regard w as a 
vector field.) Thus we get the divergence theorem again. Note that 


curl (grad f) = +d df = 0 
and 
div (curl w) = d ## dw = d*w = 0, 
since d? = 0. 


Appendix II, ELEMENTARY DIFFERENTIAL 
GEOMETRY OF SURFACES IN E?® 


For purposes of computation, it is convenient to introduce the notion of a 
vector-valued differential form. Let E be a vector space, and let M be a differen- 
tiable manifold. By an E-valued exterior differential form 2 of degree p we 
shall mean a rule which assigns an element Q, to each « € M, where Q, is 
an antisymmetrie E-valued multilinear function of degree p on T,(M). For 
instance, if p = 0, then an #-valued zero-form is just a function on M with 
values in Z, An £-valued one-form is a rule which assigns an element of FE to 
each tangent vector ~ at any point of M, and so on. 

Suppose that # is finite-dimensional and that {e1,..., ey} is a basis for FB. 
Let 2;,...,Q2y be (real-valued) p-forms. We can then consider the E-valued 
p-form 2 = Q,e, -|--+- + Qyey, where, for any p vectors £),..., &» in T;(Af), 
we have 


Q,(£1, eee tp) = 2)7(81,-.-; Epe1 -f-++ + Qyefbs,..., Epjen- 


Conversely, if 2 is an H-valucd form, then real-valued forms Q,,...,Qy can 
be defined by the above equation. In short, onec a basis for an N-dimensional 
vector space EF has been chosen, giving an E-valued differential form Q is the 
same as giving N real-valued forms, and we can write 


N 
n= > Re or = <Q),...,2n>. 
1 


The rules for local description of £-valued forms, as well as the transition 
laws, are similar to those of real-valued forms, so we won’t describe them in 
detail. For the sake of simplicity, we shall restrict our attention to the case 
where F is finite-dimensional, although for the most part this assumption is 
unnecessary. 

If » is a real-valued differential form of degree p, and if 2 is an #-valued 
form of degree g, then we can define the form w A 2 in the obvious way. In 
terms of a basis, f&2—= <2),...,Qy>,thena AQ= <wAQ,...,@AQv>. 

More generally, let Z and F be (finite-dimensional) vector spaces, and let # 
be a bilinear map of EF xX F — G, where G is a third vector space. Let 


{e1,.--, en} 


460 EXTERIOR CALCULUS 


be a basis for E, let {f1,..., fac} be a basis for F, and let {g1,...,g¢z} bea 
basis for G, Suppose that the map # is given by 


#{e:,f} = x OL G%- 


Then if w = X a,e; is an E-valued form and 2 = > 0,/f; is an F-valued form, 
we define the G-valued form w A 2 by 


wo AQ= > (2 ary; A 2;) Gk. 


It is easy to check that this does not depend on the particular bases chosen. 

We shall want to apply this notion primarily in two contexts. T'irst of all, 
we will be interested in the case where FE — F and G = BR, so that # is 4 bilinear 
form on E&. Suppose # is a sealar product and ¢),..., ex is an orthonormal basis. 
Then we shall write (» A 2) to remind us of the scalar product. If 


@ = <@),...,0n> and Q= <Q),...,2y>; 
then 
fo AQ) = Yaw; AQ 


Note that tn this case if w is a p-form and 2 is a g-form, then 
(o AQ) = (—1)?X2 A «), 


as in the case of real-valued forms. 

The second case we shall be interested in is where F = Gand & = Hom(/), 
and # is just the evaluation map evaluating a linear transformation on a vector 
of F to give another element of *. This time, chuosing a basis for F determines a 
basis for Hom(F), so we can regard w as a matrix of real-valued differential forms. 
If w = (w,;) and &@ = <Q),...,Q2y4>, then 


wo A Q= <P; AQ,..., DE way AQz>. 


The operator ¢ makes sense for vector-valued forms just as it did for real- 
valued forms, and it satisfies the same rules. Thus, if @ = <2),...,Qx>, 
then d2 = <dQ,,...,dQ.> and 


Ho A 2) = de A 2+ (—1)?w A dQ 


if w is an #-valued form of degree p and Q is an F-valued form. 

We shall apply the notion of vector-valued forms to develop (mostly in 
exercise form) some elementary facts about the geometry of oriented surfaces 
in E*. Let J be an oriented two-dimensional manifold, and let ¢ be a differ- 
entiable map of 3f into E*. We shall assume that ¢, is not singular at any point 
of Jf, i.¢., that gis an immersion. Thusat each point p € AY the space y,(T, (AN) 
is a two-dimensional subspace of T.»)(E%). Since we can identify T'pp)(E*) 
with E%, we can regard y,(7'p)(.17)) as a two-dimensional subspace of E%, 
(See Tig. 11.15.) Since Af is oriented, so is the tangent plane ¢,(7,,(M)). 
Therefore, there is a unique unit veetor orthogonal to the tangent plane which, 


APP. Il. DIFFERENTIAL GEOMETRY OF SURFACES IN E® 461 


— o{T M) aes . 


Fig. 11.15 


together with an oriented basis of the tangent plane, gives an oriented basis 
of £°. This vector is called the normal vector and will be denoted by n(p). We 
can consider n an E?-valued function on Jf. Since ||n|] = 1, we can regard » as 
a mapping from M to the unit sphere. Note that ¢(Jf) lies in a fixed plane of E° 
if and only if m = const (2 = the normal vector to the plane). We therefore can 
expect the variation of » to be useful in describing how the surface ¢{4f) is 
“bending”. 

Let @ be the (oriented) area form on Mf corresponding to the Riemann metric 
induced by v. Let Qs be the (oriented) area form on the unit sphere. Then 
n*(Qg) is a two-form on Mf, and therefore we can write 


n*Qg = KQ. 


The function K is called the Gaussian curvature of the surface o(4). Note that 
K = Off ¢{A4f) lies in a plane. Also, K = 0 if g(4f) is a cylinder (sce the exer- 
ciges). 

For any oriented two-dimensional manifold with a Riemann metric we 
let & denote the sct of all oriented bases in all tangent spaces of AZ. Thus an 
element of ¥ is given by <f;,f2>, where <f;, fo> is an orthonormal basis of 
T.(M) for some z & M. Note that f2 is determined by f1, because of the orienta- 
tion and the fact that f2 1 f;. Thus we can consider ¢ the space of ail tangent 
vectors of unit length. For each z & M7 the set of ail unit vectors is just a cirele. 
We leave it to the reader to verify that $ is, in fact, a three-dimensional manifold. 
We denote by 7 the map that assigns to each <f),fo> the point « when 
<fi,f2> is an orthonormal basis at x. Again, the reader should verify that + 
is a differentiable map of F onto M. 

In the case at hand, where the metric comes from an immersion ¢, we define 
several vector-valued functions X, €;, €2, and és on § as follows: 


X= 97, 
e1(<fi, form) = exh, 


é2(<fi,fo>) = vxfe, 
€3 = non. 


462 EXTERIOR CALCULUS 


(In the middle two equations we regard ¢, f; as elements of £* via the identifica- 
tion of Ty2,E* with E*.) Thus at any point z of 5, the vectors ey(z), e2(2), eg{z) 
form an orthonormal basis of E?, where e;(z) and é2(z) are tangent to the surface 
at e{a(z)) = X(z) and e3(z) is orthogonal to this surface. We can therefore 
write 
(dX A e3) = (dX,e3) =O and (ey, ¢;) = 62 
By the first equation we can write 

dx = Wey + Wglo, (ILL) 

where 
w, = (4X, e) and wy = (dX, eo) 


are (real-valued) linear differential forms defined on $. 
Similarly, let us define the forms w,; by setting 


wij = (de;, ¢3). 
Applying d to the equation (e,, ¢;) = 6, shows that 
Wy; = — Wj; (11.2) 
If we apply d to C1I.1), we get 
O = ddX = dwye, — w, A de, + dwoeg — we A deg. 


Taking the scalar product of this equation with e, and eg, respectively, shows 
(since w); = 0 and wee = 0) that 


dw; = we A we and dw, = 0) A Wie. (II.3) 
If we apply d to the equation 
de; = > Wi;€3, 
3 


we get 
0 = di(dw,;;e; — 3; A de;) 


and if we take the scalar product with e;, we get 


dwii = » Wik A Wk; (II.4) 


Tf we apply d to the equation (dX, ez) = 0, we get 
0= ad(dX, €3) = (dx, dé3) = (we, + Woe, W181 + wW32€2), 


which implies that 
Wy A W321 + Wa A Oe ad 0. (11.5) 


We will now interpret these equations. Let z= </J,, fe> bea point of §. 
For any é & T.(5) we have 


{&, dX) = {é, dn*g) = {&, 1*de) = (Web, dg) = Ga(W6). 


APP. II. DIFFERENTIAL GEOMETRY OF SURFACES IN E® 463 


Therefore, 
{E, 1) = (G47 8, €1) 
= (44748 veh) 
= (w.i, fi), (I1.6) 
since the metric was defined to make », an isometry. In other words, ¢(£, w,) 
and (£,w:) are the components of 7,£ with respect to the basis <j), fo>. 


If + is another tangent vector at z, then w, A w(t, 9) is the (oriented) area of 
the parallelogram spanned by 7,£ and 74. In other words, 


@, A we = 1*Q, (1.7) 


where & is the oriented area form on M. 
Similarly, 
(f, des) = NT xé, (II.8) 
and we have 
NgTe& = (E,31)¢1 + (é, wse)e2 
= (6, Wai desti + (E W22)¢efo- (II.9) 


Since we can regard e; and ég as an orthonormal! basis of the tangent space to the 
unit sphere, we conclude that w3, A g2(g, 7) is the oriented area on the unit 
sphere of the parallelogram spanned by 2,7, and 77,9. Thus 


W31 A We = T*n* Qs 
= 7*KQ 
= Ko, A @o. 


ee 


be the matrix of the linear transformation n,: T',(M) — Tnc)(S”) in terms of 
the basis <f}, fo> of T+(M) and <e1, ¢2> of Tacs(S?). Then comparing (II.6) 
with (II.9) shows that 


31 = aw, + bw and 39 = bw, + cw. (11.10) 


Let 


If we substitute this into (II.5), we conclude that b = 6’, ie., that the 
matrix of n, is symmetric. This suggests that it corresponds to a symmetric 
bilinear form of some geometrical significance. In other words, we want to 
consider the quadratic form 

aia} + 2bw102 + cov} 


{where it is understood that this is the quadratic form on 7.({5) which assigns 
the number 


aE, w1)? + QE, a1) (E, we) + ef, we)? 
to any ¢ € T,{S)}. 


464 EXTERIOR CALCULUS 
EXERCISES 


IL2 Show that 
ac, @))* + 20(£, wr¢E, 2) + cté, we)? = (at xf, Nyt 5 &). 


H.2 The quadratic form which assigns to each ¢ © T,(Af) the number (psf, 4%) 
is called the second fundamental form of the surface. We shall denote it by IE). 
(What is usually called the first fundamental form is just |[¢|/? in our terminology.) 
Let C be any smooth curve with €’(0) = ¢. Show that 


2 
Igy) = — ( e2€ @, n0o)) 


df 


Thus II(Q measures how much the curve ¢ o C is bending in the n-direction. Suppose 
we choose C to be such that ¢ © ¢ lies in the plane spanned by ¢,& and n(z). [Geomet- 
rically, this amounts to considcring the curve obtained on the surface by intersceting 
the surface with the plane spanned by »,£ and n(x}.] Show that II(f) is the curvature 
of this plane curve. 

In this sense, the second fundamental form II({) tells us how much the surface ts 
bending in the direction of ¢. 

K = det E ‘ : 
¢ 


Note that 
Let Ai and dg be the eigenvalues of the matrix 


Thus 
Ar = max II{¢) and do = min II(f) for lg] = 1. 


If X\, ¥ Xe, there are two orthogonal eigenvectors which are called the directions of 
principal curvature of the surface. (Note that they must be orthogonal, since they are 
eigenvectors of a symmetric matrix.) 

If A is a Euclidean motion of E83, then Y = -l¢¢ is another immersion of 
if and it is easy to cheek that both the Riemann metric induced by ¥ and the second 
fundamental form associated with ¢ coincide with those attached to ~. What is not 
so obvious is the converse: If ¥ and ¢ induce the same metric and the same second funda- 
mental form, then! = Ao ~ for some Euclidean motion A. We will not prove this fact, 
although it is a fairly easy consequence of what we have already established. 


We have scen the meaning of w,, @s, w3;, and wz. in geometric terms. Let us 
now interpret the one remaining form, wy 2. 

Let ¥ be a differentiable curve on Af. A differentiable family of unit vectors 
fi (-) along ¥ [where f(s) € Ty,,,(17)] is the same asa curve C in F with oC = ¥. 
{Here C(s) = <j,(s), fos) >.) Let us cali the family f,(s) parallel if the unit 
vectors are all changing normally to the surface in three-space. In other words, 
fi(s) is parallel if the vector 

devs Fils) 
ds 


APP. I]. DIFFERENTIAL GEOMETRY OF SURFACES In E® 465 


is normal to ¢(Af} for all s. Let us see how to express this condition. Let é, 
be the tangent vector to the eurve C at C(s). Then, by the definition of de,, 


£ evn es(fu(0)) = (ts, dey). 


Note that ((£:, de.), e:(C(s))) = O and (és, der), eo(C(s))) = (Es, w12). Now 
¢, and ¢2 span the tangent space to ¢(Jf), so saying that f,(-) is parallel is the 
same as saying that (£,, #12) = 0. 

Thus f(s) is parallel along ¥ af and only if (Es, @12) = 0. 

Let M and Mf be two-dimensional manifolds with Riemann metrics. 
Let u: M — Mf be a differentiable map which is an isometry. Let $ be the 
manifold of orthonormal bases of Af, and let ¥ be the manifold of orthornormal 
bases of Af. Then u induces a map @ of § — J by 


B(<fi, for) = <Usfi, Ueto. 
Let w, be the differential form on $ given, as in (11.6), by 
cg, 1) ot (74 &, Fx) for g e T'.(3), 


where z= <fi,f2>, with the corresponding definition for w2,@ , and @-. 
Then for any € € T,(#) we have, since 7 ¢ %# = uo mT, 


{&, G@1) = lin E, O1) = (Fadind, Uy Si) = (Mata. Mahi) = (148 Si) = (hor). 


In other words, 
&*O, = @) and Raq = We. 


Now suppose that the metrics on 4 and 3¥ come from immersions ¢ and @. 
Then we get forms w;; and @;;. Now by (11.3) we have 


@*(@, A @a,) = dO, = d(%*G,) = dw, = w, A or. 
Thus 
wy, A tte, = wy A 1 and wy A f*a1g = &2 A @12, 
or 
(B*G)2 — B22) A = 0 and (#*@,2 — wie) A we = 0. 


Since the differential forms w; and we are linearly independent, this can only 
happen if 
t*a 12 = WjQ. 


In other words, if the two surfaces y(Af} and @(44) are isometric, they have 
the “same” 9, that is, the same notion of “parallel vector fields”. Observe 
that a piece of a cylinder and a, piece of the plane are isometric, even though they 
are not congruent by a Euclidean motion. In different terms, while the forms w13 
and w23 depend on how the surface is immersed in E%, the form w;_ depends 
only on the Riemann metric induced by the immersion. 

Now we have (1.4): 


dwye = #13 A Weg = — 77, = — KQ, 


466 EXTERIOR CALCULUS 


From this we conclude that the Gaussian curvature A also does not depend only 
on the immersion, but only on the Riemann metric coming from the immersion. 

Since @ > does not depend on 9, we should be able to define it for an arbitrary 
two-dimensional manifold with a Riemann metric. Note that the preceding 
argument shows that w,2 is uniquely determined by Eq. (I1.4). It therefore 
suffices to construct an @;2 on a coordinate neighborhood so as to satisfy (IT.4). 
It will then follow from the uniqueness that any two such coincide to give a 
well-defined form. Let U be a coordinate neighborhood of Af, and let ¥: U > ¥ 
be a differentiable map such that 7 oy = id. Thus ¥ assigns a basis </j,, fo> 
to each « & U, in a differentiable manner. (One possible way to construct ¥ is 
to apply the orthonormalization procedure to the vector fields < 2/dx', 0/ax? >.) 

Once we have chosen ¥, any basis of 7, differs from (x) by a rotation. 
If we let 7 denote the (angular) coordinate giving this rotation (so that r is 
only defined mod 27), then we can use the local coordinates on U together 
with 7 as coordinates on #7!(U/). More precisely, if x’ and x” are local coordi- 
nates on U, we define y', y?, 7 by 

yi = aiog, y? = x? og, 

and r(z) is given forz = <e,, e2.> by 

e, = eos r(z)f, + sin Tl) fo, 2 = —sin T(z) f, + cos tz) fo, (IL.i1) 


where <fi,fo> — (rt) when <e,,e.> & T,(AL). 
Now let 
0, = p*(w;) and 02 = ¥*(w»), 


so that @, and 62 are forms defined on U’ and are, in fact, the dual basis for ¥(x) 
at each « & A. If we set 


a, = T*O) and “n= W* Aa, 
then (11.11) gives 


@) = 08 Ta, + SIN Tag and @g = —sin Ta, + GOS Teo. 


Note that 
@, A @2 = a, A Qe. 


Define the functions /, and fy on J by 
dé, = 116) A 82 and dO = £564 A Oo. 
Let k) = lt, ow and ko = ly o wr, so that 


dea, => Kya A ag and day = koa ‘\ @o. 
Now 
dw, = —sintdr A a, + costdr A aot (k, cost + ke sin T)ay A ae, 
dws = —costdr A a, — sint dr A ag+ (+k2 cost — ky, sint)ay A ay. 


APP. Il. DIFFERENTIAL GEOMETRY OF SURFACES IN E® 467 


Since w, A we = a; A a@e, We can rewrite these equations as 


dw, = (dr -i- (k, cos 7 -} ke sin Tw) A wo, 
dwe = —(dr — (tke cost — ky sin r)w.) A oy. 


We thus see that the form 


wy. = dr- (ky cost -| kgsin tT)w, -| (—k, sins + £2 cos T)we 
= dr + kya, + koa 


satisfies the desired equations. 

As before, on any two-dimensional Riemann manifold we will call a family 
of unit vectors parallel along a curve ¥ if (£;, @)2) = 0. With this definition of 
parallel translation we can state the following: 


Theorem, Let 7 be any differentiable curve on 1%. Given the unit vector 
91 & Ty q)(J£), there is a unique parallel family of unit vectors g1(s) along Y, 
with g,(0) = gi. If g}(0) is another unit vector of 7'y,9,(Jf) differing from gy 
by an angle ¢, then g{(s) differs from g,(s) by the same angle o for all s. 


Proof. It is clearly sufficient (by breaking Y up into small pieces if necessary) to 
prove the theorem for curves ¥ lying entirely in a coordinate chart. Then we 
can use the local expression for w 2. 

Let us rewrite the condition for parallel translation along ¥(s). In terms of 
local coordinates, the unit vector g,(s) is given by a function 7{s), where 


gi(s) = cos r(s)fi(¥(s)) — sin r(s)f2(V(s)). 
Then 
{E, @i2) = (é, dr) + (Ee, kyay + koa) 


= aris) + (,, w*{Ki81 + ke@2)} 


_ ars) 
dg 


But 7yé, = ¢5 is the tangent vector to Y at ¥(s). Thus 


Es, @12) = arts — Fy(s), 


— (atxbe, k18, + k262). 


where Fy{s) = (fs, 10, -+ ko%) is a function depending only on s. In par- 
ticular, g,(s) is parallel if and only if 


art) = F,(s). 


From this we see that given g1(0) there is a unique parallel family g;(s), starting 
with g,(0). Furthermore, if g}(0) is a second unit vector at (0), the angle 


468 EX'TERIOR CALCULUS 


between gi{s) and g{(s) is equal to the angle between g)(0) and g{(0). Thus 
parallel translation preserves angles, which proves the theorem. 0 


Note that if Af is (locally isometric to) Euclidean space, then we can choose 


f= si and = fz — 73 
so that @; = dr! and 6) = dz”. In this case, k} = ko = 0 and + is just the 
angle that g; makes with 0/dz', that is, with the zy-axis. Thus w;> = —dr 
in this case. Then the condition for parallel translation becomes dr/ds = 0, 
which coincides with the usual notion of parallelism in Euelidean geometry, 
Note thai in Euclidean space the parallelism does not depend on the curve ¥. 
This is not true in general. 


Exercise 11.3. Let ¥1 and Y2 be two ares of great circles joining the north and south 
poles on S*. Suppose that ¥; and ¥2 are orthogonal at the poles. Let ¢ he a tangent. 
yeetor of the north pole. Compare its translates to the south pole via ¥, and Ye. 


Let M be any two-dimensional Riemann manifold. For any curve ¥ on Af 
there is an obvious way of choosing unit vectors along ¥: just let gi(s) be the 
unit tangent vector to ¥ at Y¥{s). Thus for every curve ¥ on Af we get a curve, 
which we shall call 7, on 5. [Here y — (Y¥(s), g1(8), ge(s)) and gy(s) is the 
tangent to Y{s}.] 

We call the form 7*(w 2) the geodesic curvature form of Y. [In the Euclidean 
case this is just the ordinary curvature (sec the exercises).] 

Let us consider those curves whose geodesi¢e curvatures vanish, i.e., those 
curves whose tangent vectors are parallel. We shall call such a curve a geodesic 
with respect to the given Riemann metric. Note that the condition that a curve 
be geodesic is given, in local coordinates, by a second-order differential equa- 
tion. Therefore, a geodesic C(-) is uniquely specified by giving C(t) and C’(é) 
at any fixed value of & In Chapter 13 we use the term “geodesic” to mean 
a eurve which locally minimizes length. It is the purpose of the next few exercises 
to show that geodesics in our present sense have this property. 


EXERCISES 


Il.4 Let x,y be local coordinates on U'C M. ‘Through each point of the curve 
y = 0 (that is, the z-axis in the local coordinates}, construct the unique geodesic 
orthogonal to this curve. (See Fig. 11.16.) Let s be the are-length parameter along 
the geodesic, so that the geodesic passing through (wz, 0) is given by 


(y(u, 8), 2(u, s)). 


Show that the map (i, s) > (yfu, s),2(u,s)) has nonzero Jacobian at (0,0) and 
therefore defines a coordinate system in some open subset UC U. 


APP. IL DIFFERENTIAL GEOMETRY OF SURFACES IN E?® 469 


} ey 


Fig. 11.16 Fig. 11.17 


11.5 We are going to make a further change of coordinates. Let Y be the vector 
field on U’ defined by the properties 


i=. (wg)-0 Capro. 


Thus ¥ is orthogonal to the geodesics u = const and points in the increasing L'-diree- 
tion. Let us consider the solution curves of this vector feld parametrized by the initial 
position along the geodesic u = 0, That is, let v be the arc-length parameter along the 
geodesic « = 0, and consider the map 


(u,v) > (u, su, v)), 


where s(u, v) is the s-coordinate of the intersection of the solution curve of F passing 
through (0, #) with the geodesic given by u. (See Fig. 11.17.) Again the existence 
theorem and smooth dependence on parameters, together with the fact that the curves 
u = Oand s = OQ are already orthogonal, guarantees that we can find some neighbor- 
hood IH so that (u, v} are coordinates on W. We have thus constructed coordinates 
such that the curves u = const are geodesics and the curves u = const and v = const 
are orthogonal. Such a system of coordinates is called a geodesic parallel coordinate 
system. 

11.6 Let (u, 9) be a coordinate system on UC M for which (4/du, @/dv) = 0, so 
that the metric takes the form 

ds? = pdu?-|- ¢ dv?. 


Define the choice of frame Y by normalizing d/éu, d/dv so that (x) = <j1,fo>, 
where fi = (0/du)/||(0/dx)|| and fo = (8/de)/\{(8/4v)||. Show that the forms 
and 2 are given by 
fi = pdu, 02 = gdb, 
and 
1 Op 


2, Op 1 49 
wiz = dr —x(! av du + pou iv) 


- 2/268) 42 (0% | 
pg lduXp dus ' av \yg dv 


and 


470 EXTERIOR CALCULUS 


II.7 Let (u,v) be a geodesie parallel coordinate system, as in Exereise [1.5. The 
curve C, given by C.(e) = (u, ») is a geodesic. Thus (Cv), w12) = 0. But in terms 
of our local coordinates, (€’’(v), dr) = 0, since C’{v) is always parallel to one of the 


base vectors fz, and 
(C""(v), E*du) = (Cl (n), du) = 0, 


since w = const along C. Thus we conclude that dg/@u = 0, or g = qv). Let us 
replace the parameter » by w = J§ g(é) df. Then {u, 2) is a geodesic parallel coordinate 
system for which we have 

ds? = pdu*+ dw?, 


and now the are length along any curve « = const is f dw. 

11.8 Show that for |20] sufficiently small, any curve joining (0, 0) to (0, #2) must have 
arc length at least jw{. Conclude that (since the choice of our original curve ¢ = 0 
was arbitrary) the geodestes locally minimize length. 

11.9 Let <w,z> be local coordinates on an apen set U! of a Riemann manifold 
with the property that the curves C. given by C.(w) = (w, 2) are geodesics param- 
etrized according to arc length. Thus z = const is a geodeste and |/¢/dw|} = 1. Let 


a a) ae 
dw dzf 


Show that da/éw = 0. ‘Hint: Show that by orthonormalizing (¢/dw, 9/2}, we obtain 
a map ¥ whose associated forms 9; and @> are given hy 6, = dw-+ a dz, do = hdz, 
where 6 = [8/82 — @(8/éw)||. Then compute 2; and lo and use the fact that z = 0 
is a geodesic.] 

IL10 Construct geodesic polar coordinates. That is, for a fixed p& AM let @ be an 
angular coordinate on the unit circle in T,(Af}. For cach @ let Ce(-) be the unique 
geodesic, parametrized by are length, such that C(O) = p and €'(0) corresponds to 
angle @in 7,(1/). Show that the map {7, 0) > Cir) gives “coordinates” on U'  {p}, 
where U is some neighborhood of p. By taking <7, @> to be the <w,z> of Exercise 
IJ.9 and passing to the limit r = 0, conclude that 


Thus in terms of the “coordinates” <r, > on Uo — fp), the Riemann metric takes 
the form 

ads? = dr? -+ | dé”. 
‘These coordinates are the analogue, for a general two-dimensional Riemann manifold, 
of the polar coordinates introduced on the plane and on the sphere at the end of 
Chapter 9. The argument given there applies generally to give another proof of the 
fact that geodesics locally minimize are length. 


We now continue to study the consequences of the equation 
dor = w*(KQ). 


Let D be a domain with regular boundary on Af, and Jet ¥ be a map of some 
neighborhood of D—-¥ satisfying mo ¥ = identity. Then, by Stokes’ 


APP. I]. DIFFERENTIAL GEOMETRY OF SURFACES IN E% A471 


theorem, 
f ¥*(wy2) = =] fertKQ = =i Ka, (1.12) 
aD D D 


where @ is the area form on Af, 

Let us apply this to the map ¥ constructed as follows: Let Y be a vector field 
on &@ compact manifold AZ which vanishes at only a finite number of points 
Pils+++,Pr Ab any «< Jf where Y, ~ 0, put fy(x) — Y2/||¥all, so ¥(z) = 
<a, fi (x), fo(x) >. About each point p; choose a chart (U;, 2,) such that U; con- 
tains no other point p; and hj{p;) = (0,0) in the plane. Let ¥; be &; M(C;,), 
where C, is the circle of radius r in the plane. [Here r is chosen so small that C, 
lies in A,(%/,).] Now let D = M — U, (interiors of ¥;,,). We must compute 
fr, ¥*(w 2), where ¥,, 1s oriented clockwise rather than counterclockwise. 
lor this purpose, we introduce about p, the orthonormal frames coming from 
the coordinates x!,2?. Thus we have coordinates «z!, x*, 7, where r(z, f1, fo) 
measures the angle that f;(7) makes with 


8 
ox} 


Thus (taking ¥,,,,, cloekwise} 
ff delnio) = 2m 


(index of ¥ at p,). But ¢*w,. = ede + ¥*(kya! + kee”), so 
_ ip ¥*wyo = 2m (index of ¥ at p,) -} i ky! -| koe?. 
Vive Vier 


Now as 7 — 0, the second term on the right vanishes and Jo — Sat on the right 
of (11.12). Thus we have proved the following: 


Let ¥ bea vector field which vanishes at a finite number of points p),..., Pas 
Then 

2 | 

index (¥} = =— i, K dA. 1T.13 

© index (Y) = a= f, (TT.13) 

In particular, the sum of the indices of a vector field is independent of the 
yeetor field, and the total integral of the curvature is independent of the 
Riemann metric. 


Thus, for instance, if Af is S*, the veetor field tangent to the meridian circles 
has zeros only at the north and south poles, and the index at each of these 
points is +1. Thus (11.13) says that the sum of indices of any vector field on S? 
must equal 2. (In particular, it is impossible to construct a veetor field on S? 
which does not vanish anywhere.) Furthermore, (11.13) says that no matter 
what Riemann metrie we put on S®, we have 


[Kaa = 4n. 


Similarly, if M is the torus, we can put it on a vector field which does not 
vanish anywhere, so the integer in (II.13) is 0. 


472 EXTERIOR CALCULUS 
EXERCISES 


Let Af denote the torus with angular coordinates ¢1, yz (defined up to 27). The map # 
will denote the immersion of .W@ — Euclidean three-space given by 


a1 = (2-+ cos ¢1) cos ¢a, 
xz = (2+ eos ¢1) sin ge, 
z3 = SIN £1. 


11.11 What is the Riemann metric on Jf induced by F? That is, compute (2/dyi, 
0/O¢1) p, (8/81, 0/¢2)p and (8/d¢2, 2/8v2), for each p € AY. (These can be given ax 
functions of ¢1, ¢2.) Let fi, fo denote the vector fields obtained by orthonorrnalizing 
d/dy1,4/Ag2. What is the explicit expression for f1, fo? 

II.12 What is the total area of the torus relative to this Riemann metric? 

11.13 In terms of the vector fields f1,f2 given in Exercise If.11 we can introduce 
(angular) coordinates ¢1, ¢2, 7 on $. In terms of these coordinates, what are the 
explicit expressions for the vector-valued functions X, f', f?, f#, on ¥? (Each of these 
vector-valued functions should be given in terms of the fixed standard basis of Euclidean 
three-space, i.e., a triplet. of funetions of ~1, ge, 7.) ‘Thus 


X(e1, ge, r} = <(2+ cos v1) cos v2, (2+ cos v1) sin ¢e, SIN YL >. 


What are the corresponding expressions for f1, fo, and f3? 


II.14 What are the explicit expressions for the forms w1, w2, and wi2 on ¥ given in 
terms of do), dye, and dr? 

I1.15 Whatis the Gaussian curvature X of M? (Again, K can be given as a function 
of v1, ¢2-) For whieh points of Af is the Gaussian curvature positive, and for which 
points is it negative? 

II.16 On any Riemann manifold there is an obvious way of identifying a linear 
differential form with a vector field [via the identification of T3(Af) with T,(M) given 
by the scalar product in 7',(1/)). ff ¢ is a function, the vector field associated to the 
form dg is denoted by grad g. Let Jf be the torus with Riemann metric as above, and 
let 2; be the function given by r1(~1, ¢2) = (2+ cos v1) cos ge, as before. Show that 
the vector field X = grad z: vanishes at exactly four points, and compute its index 
at each of these points. 


Let #!, 2® be a local system of coordinates on UC M, and let ¥: U > & be given by 
orthonormalizing <é/dx!, A/dx2 >. hus coordinates on x7!(U) are chow, 2% on, 2, 
where 7(z) is the angle that f; makes with 0/dz! if z = <fi,fo>. Then 


wig = dr-+ kya + koag = dr + wht 


on r-"(U). Let C be a eurve in U with C’() > 0, and let @ be the curve in #(U) 
assigning to each ¢ the unit tangent vector C’()/|[C’(H ||. Then 


fer - [Cor = [veo f ar, 


where 7 is the angle that C’(£} makes with the z-axis. 


APP. II. DIFFERENTIAL GEOMETHY OF SURFACES IN E* 473 


From this formula we can deduce a number of interesting consequences which we 
state in exercise form. 


1E.17 Show that the sum of the exterior angles of 4 geodesic triangle D is given by 
in fp KM. (A geodesic triangle is a domain whose boundary consists of three 
weadesies,) 

(1.18 Suppose that Dis a domain which in the local coordinates is 
xitven by a simple polygon. ThusdD = C) U--+- Ut, where each 
#'.iv a curve which is a straight line segment in the local coordinate 
system. Letay,..., a% be the interior angles. (See Fig. 11.18.) 
Fhow that 


ka + f KQ = 27 —>> few Fig. 11.18 


11.19 This gives us another way of computing the integer in (11.13) if M is a compact 
rurfiuce. 
Definition. A cellulation of a compact differentisble two-dimensional manifold Af 
is a finite collection of closed subsets (called thevells) F1,..., Pm such that 


Ms= U F; 
il 
satisfy the following: 
1) For each F; there is a one-to-one bidifferentiable map f; of a neighborhood of 
F; onto a polygon with x; edges (n; > 3). 
2) For 7 + /, either F; F; is empty or f(F; 0 F;) is a ixed edge or vertex of | 
the corresponding polygon. 


Let f be the number of faces, ¢ the number of edges, and » the number of vertices 
iu the eellulation of a two-dimensional Riemann manifold. Show that 


freton f Ko. 


Thus f — ¢-+ » does not depend on the cellulation, We have thus given three distinct 
ways of computing an integer attached to the manifold, This integer is called the Buder 
characteristic and is denoted by X(M). Thus 


x(M) = f Ko = f—etv=> index Y. 


1.20 By a regular cellulation we mean a cellulation such that each face has the same 
number of edges and each vertex is the union of the sare number of edges. Show that 
there are at most five possibilities for the number of faces in a regular cellulation of the 
aphere. Conclude that there are at most five “regular solids”. 


CHAPTER 12 
POTENTIAL THEORY IN eE” 


1. SOLID ANGLE 
In what follows x!,...,2” will denote Euclidean coordinates on E*. Let 
ry? = (2')? +--+ + (2")?: then dr? = 2r dr, so that we have 
rdr = > 2‘ dz’, 
er dr = > (—1)* at de! A--- det--- A dz’, 
d+rdr=ndz! A+-+ A dx™ 
Let ¢ denote the injection of S*~' — E” as the unit sphere. Let V, denote the 


volume of the unit ball and A,_; the volume of the unit (2 — 1)-sphere. The 
volume of the sphere of radius r is thus r?—7A,_,, and so 


1 
Va= | rA _idr= i An-1. 
0 th 


Since 7*(#r dr) is an (% — 1)-form invariant under rotations, it is some multiple of 
the volume form on S*~!, By Stokes’ theorem, 


(ar dr) = derdr=nf det A--- A dx”. 
= es is 


Comparing this with the above, we conclude that 7*(#r dr) is the volume form on 
the unit sphere. 
Let p denote the projection of E* —- {0} onto the unit sphere. Set 


7 = pt" (sr adr). 


Then 7 is ealled the element of solid angle. Integrated over any (x — 1)-surface, 
it gives the volume (counting sign of orientation and multiplicity) of its pro- 
jection on the unit (% — 1)-sphere. 
We have 
dr = 0, 

since 

dr = dp"t*(*r dr) 

= p*(di*(*r dr)) == 0, 


because 7* («7 dr) is an (rn — 1)-form on the (x — 1)-dimensional manifold S*—!, 
and d of any (n — 1)-form on an (x — 1)-dimensional manifold must vanish. 


474 


12.1 SOLID ANGLE 475 


Let tp denote the injection of the S*—! into E* as the sphere of radius & 
{so that z; = 2). Tt. then follows directly from the definitions that 


* aSR 


tRT = Rr-1 t 


where dSx is the induced volume form. (See Fig. 12.1.) 


Fig. 12.1 


Now the volume of the ball of radius R is R*V, and the surface volume 
(arca) of the sphere of radius R is R"~'A,_, = R7'(nR"V,). Now th(+dr) is 
an (7% -— 1)-form and thus zk{*dr) = f dSg for some function f. Since *dr and 
dS, are invariant under rotation, we conclude that f is a constant. But 

f th(sr dr) = nf dz! A+++ A az" = 2RVa, 
gr-l 


Bak) 


form which we conclude that 


aSp = ad *) = th(+dr). 


tr 


Thus 


Now let « be any point in E* — {0}, and let £),..-., &,-1 be tangent vectors 


at «. Then 
(> —_ <i") (f1, sey fe—1) 


will vanish if all the 2’s are tangent to the sphere centered at the origin and 
passing through z, by the above equation. If one of the £’s, say £), is a multiple 
of (d/dr),, then again the expression will vanish, because 7(£1,..., fa—1) = 
T(#r dr) (Peb1, ---, Px'n—1) and py$1 = 0, and because +dr(1,...,°)=0. By 
multilinearity, we conclude that the expression vanishes identically and there- 
fore that 


_oxdp —oardr — Y (1) Tx! dt! A+++ Ade A+: A de” 
a tae pt yn — = 


? 


where the ~~ indicates that the corresponding term is omitted. 


476 POTENTIAL THEORY IN E* 12.2 


Fig. 12.2 


2. GREEN’S FORMULAS 
Let uv and v be smooth functions on £”. Then 

du = Dae o- - dz’, 
so 

du = (1) aed Ae dries A ae", 
and therefore 
d*du= [= wee dx’ A+++ A dx” = Audz' A+++ A dx”, 
where the operator 
32 a 
= aap t+ Gap 


1s called the Laplacian. 


Let { be an open subset with compact closure and almost regular boundary. 
(See Fig. 12.2.) Set. 


deiwl= fi au rsar= f ao radu [5 oe Be asl A+ Ad 


Now fou u* dv = fo d(u * dv). But d(u* dv) = du A tdu+u A d* dv, s0 
f u« dy = Dylu, »] + i; (udv) dal A+++ A dz. (2.1) 
au u 
Since Dy is symmetric in uw and v, we have 
f wrdy —vedu= f (uAv — vu) dx! A +++ A dx”, (2.2) 
au u 


Suppose U contains the origin, and let U, = U — B,, where B, is the ball of 
radius € centered at the origin. Let» = r?-*.t Then dv = (2 — n)r!—* dr and 
*dv = —(n — 2)r, so that d* dy = 0. Substituting into (2.2) and taking € 


tFor xn = 2, setv = loge. 


12.3 THE MAXIMUM PRINCIPLE 477 


small enough so that dU’), = aU — oB,, we get 


~~ 9 (qat sys a9} ur + et af de 


= i r?"Audri A-+- A dx®, 
Ue 


Let us examine what happens as ¢ > 0. The first term on the left does not 
depend on ¢«. For the second term, let us write « = u(O0) + O(e). Then the 
second term becomes (7 — 2)4,—,;u(0) + Of€). For the third term on the left, 
we write 
_ = 1 Pere nr 

fre = J, d+ du i Audz! A+++ A de®. 
Thus the third terrn on the left vanishes to order €”~?, Since r?— is an integrable 
function on E”, the right-hand side tends to —fyr?-" Au dz? A --- A de”. 
Therefore, if we let € — 0, we get 


ee: ge) 1 [ 2—n 1 n 
Ann) = ff (ur +! ooo To ae Audz’ A--- A dx", 
(2.3) 


In particular, if u is a harmonie function, that is, Au = 0, then 


1 1 ade 
sae ad Mar rer soe sai 
In the special case U = B,, the second term on the right-hand side of (2.4) 
becomes 
: [ ad = : f d*du = 0. 
(n — 2)An- a"—? eB, (x — 2)An—1a"-4 By 
Thus (by the definition of A,_, and 7) 


Js, u d§ 
Js, a8 


where Sq is the sphere of radius ¢ and dS is its volume element. In other words, 
if w is a function that is harmonic in some domain, then the value of u at any 
point is equal to its average value on any sphere centered at that point whose 
interior is completely contained in the domain. 


3. THE MAXIMUM PRINCIPLE 


Equation (2.5) has a number of startling consequences which we shall now 
develop. Let « be a function that is harmonic in a domain U. Let zp be some 
point of UV, and suppose that there is a neighborhood W of x9 such that WCU 
and 

u(x) < ulxg) forall ce W. 


478 POTENTIAL THEORY 1N E” 12.3 


Fig. 12.3 Fig. 12.4 


(See Fig. 12.3.) Let S, be 2 sphere of radius a centered at zo, where @ is sa small 
that Bz CW. Then u(x) < u(xo) for allz © S,. Therefore, 


1. udS < ulro) if, aS. 


if there were some point z € Sy with u(x) < u(zo), then u(y) would be less than 
u{zo) for all y sufficiently close to x, since u is continuous. But then the above 
inequality would be a striet inequality, i-e., 


f, udS < u(x) i: ds, 


which contradicts (2.5). We must therefore have 
uz) = u(r) forall zx € Sa. 


Now suppose that W is an open set that is connected; that is, suppose that any 
two points of W can be joined by a continuous curve lying entirely in W. Let y 
be any point of W, and let C be a curve joining xp to y. About each x on the curve 
we can find a sufficiently small ball with center x lying entirely in W. By the 
compactness of C we ean choose a finite number of these balls which cover C. 
We can therefore formulate the following: There are a finite number of spheres 
So,,+-+;8,, such that each sphere and its interior lie entirely in W, S,, has 
center og, the center x; of Sg,,, lies on Sy,, and y € S,,. (See Fig. 12.4.) But this 
implies that u(zo) = u(z,) = +++ = ufzz) = uly). In other words, we have 
established: 


Proposition 3.1. Let u be harmonic in 2 connected open set W, and suppose 
that u achieves its maximum value at some tp) € W. Then w is constant 
on W, 


An immediate corollary of this result is: 


Proposition 3.2. Let U be a connected open set with 0 compact. Then if 
uw is a function that is continuous in U and harmonic in U, 


u(z) < max u(y) (3.1) 
pea 


unless uw is a constant. 


12.4 GREEN’S FUNCTIONS 479 


Proof. In fact, since 7 is compact and u is continuous, u must achieve its 
maximum at some point zo of J. If we could actually choose rg € U, then u 
would have to be a constant by Proposition 3.1. If uv is not a constant, then 
to € AU, and we have (3.1). 0 


From this we deduce: 


Proposition 3.3. Let U be a connected open set with U compact. Let u 
and v be functions that are continuous in 7 and harmonic in LU’. Suppose 
that 

u(y) = vy) = forall y eau. 
Then 

u(x) = vt{x) forall zEeU. 


Proof. In fact, u — v satisfies the hypotheses of Proposition 3.2 and vanishes 
on 6U. Thus u(x) — ox) < 0 fore EU. Similarly o(z) — u(x) < 0, which 
implies the proposition. 0 


An alternative way of formulating Proposition 3.3 is to say that on a domain 
U a harmonic function is completely determined by its boundary values. This 
is & uniqueness theorem: there is at most one harmonic function with given 
boundary values. It suggests the problem of deciding whether the corresponding 
existence theorem is true. This problem is known as Dirichlet’s problem. 


Dirichlet’s problem. Given a continuous function f defined on dU, does 
there exist a function wv that is continuous in U and harmonic in U and such 
that u(y) = f(y) for ally € dU? 


We shall show in Section 9 that we can always solve Dirichlet’s problem for 
domains with almost regular boundary. 


4. GREEN’S FUNCTIONS 


Suppose that U is a domain for which the Dirichlet problem can be solved. 
We shail show in this section that the solution can be given “explicitly” in terms 
of a certain integra! over the boundary. 

We first introduce some notation. For each x € E" set! 


1 
Ky {z, y) = (fa — 2)An—il|z ae y||"-2 


so that (for fixed x) 
+dK, (x, -) = 


for yEE"— {2} = (n > 3), 


yaar tT, on E”— fz}, 
where 7, denotes the solid angle about the pomt x. Then for x € U, Eq. (2.3) 


{This is for n > 3. Forn = 2, set Ko(a,y) = (1/2m) in [x — yl. 


12.4 GREEN’S FUNCTIONS 481 


since G(x, -) and h are smooth functions on B,,,. On the other hand, 

fog, OO * UY =f Ke) dG, + fi Me.) aGy, +). 
The second term tends to zero, as above. The first term can be written (for 
n > 3) 


1 1 
LaGv—pe= 7 +dG(y, +) = Atk — De i d« dG(y,+) = 0, 


since G(y,+) is harmonic in B,,,.. A similar argument works for » = 2 with 
e"—* replaced by log €. This proves (4.3). 

Let uw be any smooth funetion on WU. Apply (2.2) to u and v= Giz, -) 
on U — B,,.. We get (since G(x, -) = 0 on dU) 


Ba ux dG(x, +) — a ux dG(x, +) + De G(x, +) * du 
=f G(e,-) Aude! A+++ A da”. 


The third integral on the left-hand side can be written as 


| Ky (x, +) « du + ice -) «du 
oBze 


1 
~ A,(n— 2)en-2 ie om! i a) 


=s 1 a—1 n—1 
= Fee — Hyena 1) + Of) 
and so tends to zero. (The usual modifieation works for n — 2.) We get 
u(x) = I G(x,-) Audx! A +++ A dx” + / u* dG(z,:). (4.4) 
; u au 


We observe that (4.4) shows that if we know that there exists a solution to 
AF = f with the boundary conditions F = 0, then it is given by 


F(a) = [G(e, wu) dv. 
Similarly, if we know that. there exists a smooth solution to the problem 
Au = 0, u(x) = f(x) for x Eau, (4.5) 
then it is given by 


ule) = i ee dae). (4.6) 


It is important to observe that these formulas are consequences of the existence 
of Green’s function for (7. Thus they are valid whenever we can find the fune- 
tion h such that properties (ii) and (iii) hold. 


482 POTENTIAL THEORY IN [E” 12.5 


5. THE POISSON INTEGRAL FORMULA 


In this section we shall explicitly construct the Green function for a ball of 
radius R. Let Bp be the ball of radius R centered at, the origin. 

For any x € E” — {0} let 2’ be the image of x under inversion with respect 
to the sphere of radius R: 


R? 
yo = 775e. 
eli? 
Define the funetion Gz by 
i | aie * 
| ars ~ Pay aps Pe. 
An_itn — DEr(s, ¥) = , , (5.1) 
if ¢= 0.1 


vip? Rane 


IfzE Bp, then x € Br, and so the second terms on the right-hand side of 
(5.1) are continuous and harmonic on Br. We must merely check that property 
(ii) holds. Now for |jy|| = R we have, by similar triangles (or direct 
computation), 

Rly a 
fell [ly — il 
so that 
Griz) =O for lyll = B. 


(See Vig. 12.5.) This is (iii), so we have verified that Gg is the Green function 
for the ball of radius FR. 

To apply (4.4) we must compute *«d@x on the sphere of radius R. Now by 
(5.1) we have (for x # 0) 


R 
An * AG n{e, +) — Tz — ih y[ia—2 Te 
ye > at R*-? y! —_ ant | ; 
— os oo oon "so di . 
ly — ll ee? iy — eid * 
But, by (5.2), 


Re Whar]? : 
Groner =p, ue (OF lull = 2. 
elle—2Iy —~ a7e Rly — of 

Tf 2 = 2, set 
eae ; zl Cn eee 
Ga(x, y) = log lly — 2] — log R lly— 27 i x #0, 


it 


log fly|| — log R if « =0, 


12.5 THE POISSON INTEGRAL FORMULA 483 


Fig, 12.5 


We thus see that for flyl] = 2, 

4 * i i i all? i R? i z 
Fdncy © 468) = Se ge [s — ot — EE (v — Gee’) | a] 
~« {| RB? — |[z\/? ‘ | 

*\ypoapmty ty 


» | R? — ite? | 

= i, : rar; . 
* (Rally — al * 

But zp(er dr) = RdSp, where dSp is the volume element on the sphere of 

radius &. If we substitute into (4.6), we obtain 


2 
u(x) = a 7 ree i, : ee adSr (the Poisson integral formula). (5.3) 

In the proof of (5.3) we used the assumption that the function w is differ- 
entiable in some neighborhood of the ball Bg and is harmonic for |z|| < 2. 
Actually, all that we need to assume is that w is differentiable and harmonic for 
\|a|| < # and continuous on the closed ball ||z|| < 2. In fact, for any ||z|| < BR, 
Eq. (5.3) will be valid with R replaced by R., where ||z|| < Ra < &. If we then 
let Ra approach R, we recover (5.3) by virtue of the assumed continuity of u. 

Equation (5.3) gives the solution of Dirichlet’s problem for a ball provided 
we know that the solution exists. That is, if wis any function that is harmonic on 
the open ball and continuous on the ¢losed ball, it satisfies (5.3). Now let us 
show that (5.3) is actually a solution of Dirichlet’s problem for prescribed 
boundary values. Thus suppose we are given » continuous function « defined on 
the sphere Sr. Then we are given u(y) for ally € Sz. Define u(2) for |x|] < 2 
by (5.3). We must show that 


a) wis harmonic for |lz|| < R, and 

b) u(x) — u(yo) if z > yo and |lyol = RB. 

To prove (a) we observe that @p(z, y} is a differentiable function of x and y 
in the range [lz|| << Ry < R, Ry < Iylt < R*/R,, and is, by construction, a 
harmonic function of y. Vor ||x|| < R and |lyl] < R, we know, by (4.4), that 


Gpr(x, y) = Gry, 2). 


484 POTENTIAL THEORY IN E” 12.5 


Thus for fixed y with |ly|] < R, Gr, y) is a harmonic function on Br — {y}. 
Letting |lyll| > 2, we see that Gr(z, y) is a harmonie function of zx for 
lz] < R, < PR for each fixed y © Sx. Thus 
oGR 
oy 


{x, ¥) 


is a harmonic function of x for each y € Sp. In other words, all the coefficients 
of *dGr(-, y) are harmonic functions of x for each y € Sz, and therefore so is 
each coefficient of u(y) * dGe(-,y). It follows that the function u(x) = 
Ss, u*dGe(z,-) is a harmonic function of x, since the integral converges uni- 
formly (as do the integrals of the various derivatives with respect to z) for 
IIs|| < Ri < R. This proves (a). 

To prove (b) we first remark that the constant one is a harmonic function 
everywhere, so Eq. (5.3) applies to it. We thus have 


R? — |[z\|? [ dSr el 
poeta bard | —_=] for an < R. 5.4 
ant Js, W— oP y lel ay 


Now let yo be some point of Sp, and let « be a continuous function on Sz. 
Tor any € > 0 we can find a 6 > 0 such that 


lu(y) —ulyol <e for |ly—yol < 26 (eESpR). 
Let 2, = (ye Se: lly — yoll > 28} and Z2= {yeSr: iy — yoll S 26}. 
Then by (5.3) and (5.4) we have, for |lz|| < R, 


2 2 _ 


= allt 
50 
[u(x) — ulyo)| < Zi + To, 
where 
R? — jell? [ lu(y) — ulyo)| 
Oy ee el ee ee el 
‘= ~AR Je, Wy=ar “8? 

and 


_R- lel? f lu(y) — utyo)] 
nak dey Waal 


Now if |lyo — 2j| < 6, then for all y@Z, we have {ly — 2|| > ily — yoll — 
lz — yoll, so that |ly — z|| > 6. Thus for all x such that |iz — yoli) < 6 the 
integral occurring in 7; is uniformly bounded. Since ||zl] ~ RB as z — yo, we 
conclude that J; ~ 0 as 2 — yo. 

With respect to I2, we know that ju(y) — ufyo)| < e for all y € Zo, so that 


R? — [fz||? | edSp cee | dS r ) 
T in ree | Posed | JO = ees ee J 
a hak J2,y—th ~*\ 4aak Jee 
by (5.4). This proves (b}. 


12.6 CONSEQUENCES OF THE POISSON INTEGRAL FORMULA 485 


We have thus proved: 


Theorem 5.1. Let uw be a continuous function defined on the sphere Sx. 
There is a unique continuous function defined for ||z|| < A which coincides 
with the given function on the sphere Sz and is harmonic for ||z|| < R. 
This function is given by (5.3) for all |z{f < R. 


Fig. 12.6 


6. CONSEQUENCES OF THE POISSON INTEGRAL FORMULA 


In the proof of Theorem 5.1 we assumed that u is continuous in the closed ball, 
possesses two continuous derivatives in the open ball, and is harmonic in the 
open ball. However, it is clear from (5.3) that « possesses derivatives of all 
orders when ||z|| < R. In fact, if [iz]| < Ry < R, we can differentiate (5.3) 
under the integral sign any number of times, since all the integrals we obtain 
will be uniformly integrable in |[z!|. Now if w is harmonic in some open set U 
and « € U, we can choose FR sufficiently smail so that the ball of radius # 
centered at x will be contained in U. (See Fig. 12.6.) We can then apply (5.3). 
We thus conclude: 


Proposition 6.1, Let « be a function defined on an open set U, possessing 
two continuous derivatives, and satisfying Au = 0. Then & has continuous 
derivatives of all orders. 


We can improve Proposition 6.1 by examining (5.3) a little more closely. 
Let {/yi] = R, and let Ry < R. Therefore, all |lxi] < R,. The multinomial 
theorem allows us to expand 


1 eal ; F LL pin 
ly — 1 acl” — > Age (Y)e] tay (6.1) 
where the coefficients depend on y but the series converges uniformly in 2 for 
all ||yl] = R. Furthermore, if D is any operator of partial differentiation with 
respect to z, then we obtain an analogous expansion 


D{|y aie ae = rs Bary purty ey! Bay? an", 


where this series is obtained from (5.4) by term-by-term differentiation and 
converges uniformly in the same region. If we now substitute (6.1) into (5.3), 
we can integrate the series term by term because of the uniform convergence in 


486 POTENTIAL THEORY IN E” 12.6 


y, and we conclude that 
wx) = DL Cay vant! °° 28" 


where the series converges uniformly for ||z|| < 21. Furthermore, we can 
differentiate the series term by term. Doing this and evaluating at x = 0, 
we see that 
all, = Dud), 
where a= <ay,...,Qn™- 
We thus have proved: 


Proposition 6.2. Let u be harmonic in the ball jjz}| < R. Then 
ms = Drw(0)2", (6.2) 
where the Taylor series (6.2) converges for all ||z|| < 2 and converges 


uniformly for all {x|| < Ry < R. In particular, « is determined throughout 
the ball by its value and the value of all its derivatives at the origin. 


Fig. 12.7 


Let U' be some connected open subset of E”, and let « and » be two harmonic 
functions on U', Suppose that % and v have the same value and the same deriva- 
tives of all orders at some point « € U. Let y be some other point of U. We 
can then connect x to y by a series of balls lying in U, where the center of each 
ball lies in the interior of the preceding ball, and such that z is the center of the 
first ball and y lies in the last ball. (See Fig. 12.7.) We thus conclude that 


u(y) = vy). 
Thus we have: 


Proposition 6.3. Let «w and » be harmonic functions defined on an open 
connected set U’. Suppose that u and », together with all their derivatives, 
coincide at some point of VU. Then w =von U. In particular, if u(z) = v(x) 
for all « @ W, where W is some open subset of U, then u(x) = v(x) for 
all2ze UU. 


12.7 HARNACK’S THEOREM 487 


We continue to examine the consequences of (5.3). Let u be given by (5.3). 
Let D denote any operator of partial differentiation with respect to x. Thus 


Quite +n 
bi (@xtjur--- (d2")@n 


p* 


Then D*u(<) is given by an integral over Sy of a function involving, at worst, 
inverse powers of ||y —- x|| for y on Sz and the function « on Sz. In particular, 
if ||z]| < Ry < R, then we can estimate the maximum absolute value of D*u 
in terms of the values of w on Spy and the difference # -- R,. In short, 


|D*u(x)| < c(D"*, R, Ry) ax |x(x)| for |lz| < Ry, (6.3) 
yl=RF 


where c depends only on a, &, and R,;. Now suppose that u is harmonic in some 
open set U, and let K, and Ks be compact subsets of U with 


Ky Cint Ke C Ky cu. 


About each x © K, we can draw an open ball Br, such that Br. C int Ko, 
so that 
Sr, C Ko. 


We can also draw a ball Br, of slightly smaller radius about x. Now the open 
balls Br, cover K,. Since A, is compact, we can select a finite number of these 
balls which cover K,. By applying (6.3) to each of these balls, replacing |u(y)| 
by the larger maxzex, |u(z)|, and using the smallest of the finitely many con- 
stants ¢, we conclude that 


max |D*u(r)| < ela, K,, Ke) max |u(z)]. (6.4) 
eek, zGKy 


In particular, we have: 


Proposition 6.4. Let {2} be a sequence of harmonic funetions defined on 
an open set U/ that converges uniformly on any compact. subset. of (7. Then 
the sequence of partial derivatives {D*2z} also converges uniformly on any 
compact subset. In particular, the limit of the sequence is again a harmonic 
function. 


In fact, for any compact subset K, we can always find a compact subset Ko 
such that K, Cint Ka. Applying (6.4) to the harmonic functions uw, — u; 
establishes the uniform convergence of {D*u,} on Ky. Since the sequence of 
partial derivatives converges uniformly, Au = lim Au, = 0. 


7. HARNACK’S THEOREM 


We continue to reap the consequences of Eq. (5.3). In addition to what we 
assumed im the preceding section, let us suppose that u(y) > 0 for all y € Spr. 


488 POTENTIAL THEORY IN E” 12.7 


on Coa 


Fig. 12.8 


Now for ||a|| = Ri < R we have (see lig. 12.8) 
R—-R <|ly—cf<kR+R, forall fyll = PR. 
Then, by (5.3), 


r= Re [ (alee [ 
a. wa tT ee ee s . < < ST ee ee d, . pa 
Apa RRL Ry* Js, 68 =“ S TORR Ry Is, MOR 


Now according to (2.5), the integrals occurring on the right and left of this 
inequality are equal to A,_,R*—!u(0), ‘Thus 


(R? — RDR? (R?.. RR 
(R | Ry)" (R Ry)" 
Inequality (7.1) is known as Harnack's inequality. It has the following con- 


sequence. Suppose that {u,} is a sequence of functions satisfying (5.3) and 
such that 


u{0) < u(x) < u(0). (7.1) 


wy) S uely) S++ Surly) S++ forall ye Sp. 
If we apply (7.1) to the funetions we; — uw; (¢ = 7), we conclude that 
|uj(x) — us(x)| < a(R, Ry)lu(0) — u{0)] forall fall < Ra, 


where d(f, &,) depends only on & and Ry. 

In particular, if the sequence converges at. the origin, it converges uniformly 
for |[z|| < Ry < #. By applying our usual device of joining two points by a 
sequence of balls, we deduce: 


Proposition 7.1 (Harnack’s theorem). Let {ux} be a sequence of harmonie 
functions defined on a connected open set U and such that 


u,(Z) < uefa) S--s forall xEeU, (7.2) 


Suppose that the sequence {u,(p)} converges for some p € U/. Then the 
sequence of functions {;,} converges uniformly on any compact subset of U 
and, by Proposition 6.4, the limit function is again harmonic. 


12.8 SUBHARMONIC FUNCTIONS 489 


A useful variation of Harnack’s theorem is: 


Proposition 7.2. Let {2%} be a sequence of harmonic functions defined on 
an open set {7 and satisfying (7.2). Suppose that there is a constant Jf such 
that w(x) < M for all & and ali x € U7. Then the sequence of functions 
{u,} converges uniformly on any compact subset of U to a harmonic 
function. 


To prove Proposition 7.2 we remark that this time the convergence of 
{u,(z)} at each x € U is automatic because it is a monotone (nondecreasing) 
sequence of bounded real numbers. Proposition 7.1 guarantees that the con- 
vergence is uniform on compact subsets to a harmonic limit function. 


8. SUBHARMONIC FUNCTIONS 


Tn this section and the next we shall show that Dirichlet’s problem can be solved 
for bounded open sets U whose boundaries satisfy certain regularity conditions. 
The proof that we shall present (there are many others) is due to Perron and 
makes essential use of the concept of a subharmonie function. Let u bea function 
defined on the open set U. We say that u is subharmonic if 

a} u is continuous; and 

b) for any connected open subset W of and any harmonic function, » 


defined on W, the function u — » satisfies the maximum principle on W. 

In other words, for any such W and », if there is some xo € W with 
u{x) — v(x) < u(x) — vlxq) forall «EW, 

then u(r) — v(x) = ul(2o} — (xo) in W. 

In order to understand condition (b} a little better, we study some of its 
consequences. First of all, we can let » be the harmonic function that is iden- 
tically zero. Then (b) says that u satisfies the maximum principle on every open 
subset of U. 

Next we let Bz be some ball with center z whose closure is contained in U. 
In particular, its boundary, Sz, is contained in U, and the function z is con- 
tinuous on Sz. We can therefore find a harmonic function v defined on the open 
ball and taking on the values u(y) for y € Sx. We take W to be the open ball 
and v to be the function we have just constructed. Then « — v vanishes on Sz, 
so (b) implies that w(x) < v(x) for allz e W. In particular, u(z) < e{z). But 


1 
v(z} = Ankh! fe: udS pz. 


We have thus shown that if u is a subharmonic function defined on U and Sz 
is a sphere with center z lying in U, then 


1 
u(z) < yo [ude 


490 POTENTIAL THEORY IN E” 12.8 


In other words, a subharmonic function is always less than or equal to its aver- 
age value over a sphere about a point. This property is frequently taken as the 
definition of a function’s being subharmonic, and the hypothesis of continuity is 
somewhat relaxed. However, the definition we gave above is more suitable for 
our present purposes. 

Let w, and we be two subharmonic functions defined on an open set UV. 
Define the function w, V we by setting 


(wy V we)(z) = max [w (x), we(x)]. 


The function w, V we is again subharmonic. The fact that w, V we is con- 
tinuous is established as follows: Let c be any point of U7. Then w, V we(z) is 
elther w,(z) or we(z), say w,(z). Since w, is continuous, for any € > 0, we can 
find a § > 0 such that |wi{y) — wi(z)| < € for ly — 2] < 6. Similarly, we 
can arrange to have wo(y} < wo(z) +e for that same range of y, so that 
way) < w(x) + €, since w(x) = max [w) (x), wo(x)j. For these values of y 
we thus have 


w(x) — € < wily) Sw, V wooly) < wi(z) + €, 


so that w, V wz is continuous. Now let W be 4 connected open subset of U 
and let » be a harmonic function on W. Suppose that wy V we — v takes on 
its maximum value at some point zy of W. Suppose that wy V we(xo) = 
w,{29). Then for allz € W 


w(x) — v(x) S wy V w(x} — v(x) S wi(xa) — v(zo). 


Since w, is subharmonic, the right and left sides of this inequality must be equal. 
Thus wy; V we — v satisfies the maximum principle on W, which shows that 
w, V we is subharmonic. 

Let w be a subharmonie function defined on an open set U, and let B be a 
ball whose closure is contained in UL’. Let u be the solution of Dirichlet’s problem 
for the ball B with boundary values w. Thus wu is the unique function continuous 
in the closed ball, harmonic in the open ball, and coinciding with w on the 
boundary S of B. As we have already observed, w(x) < u(z) forz € B. Now 
define the function wg by setting 


oi u(x) for zEU—B, 
BT Vaz) for «@ EB. 


We claim that the function wg is again subharmonic. The proof is just as before. 
The continuity is obvious, as before. If W is subset of U and v is a harmonic 
function on W, then wg — v cannot achieve a strict maximum at any interior 
point: suppose wa(x) — o(x) < wale) — v(xq) for alle E W. (See Fig. 12.9.) 
If zg = U — B, then we have w(x) — v(x) < wa(x) -- v(x) < wa(xo) — (to) = 
w(xg) -- v(z9). Since w is subharmonic, all these inequalities must be equalities. 
On the other hand, if zp) € B, then since wg is harmonic in the open ball, we must 
have wa(x) — e(z) = wa(xo} — v(zo) for allz & W. By continuity, this implies 


12.9 DIRICHLET’S PROBLEM 491 


Fig. 12.9 


that wa(y) — o(y) = wal(ty) — v(zo) foryEe Wns. If WNS # @, we can 
take y as a new xg and the problem is reduced to the previous case. 

In short, to every subharmonic function w defined on U we have attached 
another subharmonic function wg such that wy is actually harmonic in the 
interior of B and coincides with w in U — B, and such that w < wy throughout 
U. It is clear from the method of construction that if w; and wz are two sub- 
harmonic functions defined on U with w, < we, then 


WiB < Wor. 


9. DIRICHLET’S PROBLEM 


Let U be an open subset of E” with U compact. We say that U has a touchable 
boundary if for every p € aU there is a ball B such that Bn U = {p}. Thus 
Fig. 12.10(a) represents a domain with a touchable boundary, while Fig. 12.10(b) 
and 12.10(c) represent domains that have untouchable points on their boundaries. 


(a) (b) i) 
Fig. 12,10 


Let U be an open subset of E* with 0 compact and with touchable boundary. 
Let f be a bounded function defined on @U/, and suppose that |f(p}| < M for all 
peau. Let Wy denote the class of all functions defined on U which satisfy the 
following two conditions: 

a) each w € W; is subharmonic; and 

b) for each p € aU, 

lim sup w(x) < f(p). 
zeEU 


i—p 


492 POTENTIAL THEORY IN E” 12.9 


We can rephrase condition (b) as follows: For each p< dU and each ¢ > 0 
there is a 5 > O (which depends on p=, w#, and €} such that 


w(x) <f(p) +e for lle — pl] < 6 


Note that the family of functions W; is nonempty, since the constant function 
which is identically —Jf clearly belongs to Wy. Also note that condition (b) 
implies that lim sup w(x) < Af asz— dU. Since w € W; is subharmonic, we 
conclude from the maximum principle that |w| < 4 for all w e& Wy. 

Now define the funetion uy by setting 


ule) = lub wr). 
weW, 


In view of the preceding remarks, the function uw, is well defined and, in fact, 
lu(z}| < M for all 2eU. We shall show that if f is continuous on 3U/, the 
function uy is the solution of the Dirichlet problem for 7. We must thus show 
that uw, is harmonic and takes on the boundary values f. 

We first show, without any continuity restrictions on f, that uy is harmonic. 
To do this it is sufficient to show that w; is harmonic in any open ball B, where 
BcU. Let x be some point of B. Let wi, we,..., we... , be a sequence of 
functions bclonging to Wy such that lim w,(z) = us(z). (Such a sequence 
exists by the definition of w;.) Now define 


we = WwW Vie V Ws. 


tions belonging to Wy; with lim wi(z) = uy(z). Now replace each w; by wip. 
The funetion wv,» belongs to W;, since it is subharmonic and coincides with wi 
near dU). Furthermore, since w < wg for any subharmonic function, we can 
conclude that lim wg(x) = u;(x). Inside the ball B, each of the functions wiz 
is harmonic, so that the sequence {wz{g} is a monotone nondecreasing sequence 
of harmonic functions in B. Since all the functions of W, are bounded by JM, 
we conclude from Proposition 7.2 that the sequence {wig} converges uniformly 
on any compact subset of B to a function uw which is harmonic in B, and we have 
u(x) = u(x). Furthermore, by definition, we have u(y) < us(y) for any y € B. 
We would like to show that we actually have u(y} = u,(y) for all such y. To this 
effect, we follow the same procedure at y, namely, we select a sequence {v;p} 
where each 2; is harmonic in B, belongs to Wy, and is such that the sequence is 
monotone nondecreasing and lim ujg(y) = u,(y). In this way we obtain another 
function » which is harmonie in B and satisfies v < uy throughout B, and 
v(y) = uy(y). Finally, let the functions s; be defined by 


Then w) < wy <--- < wi <--- is a monotone increasing sequence of func- 


Ss = wie V viz, 


12.9 DIRICHLET’S PROBLEM 493 


and consider the functions s;g. On the one hand, they all belong to Wy; and are 
harmonic in B, Furthermore, we have wig < sp and v:5 < sia. Finally, the 
sequence {s;n} is nondecreasing and therefore converges to a harmonic function 
sin B. By construction, we have 


u<ss and vis 


throughout B, while u(x} = s(x) = u,{z) and u(y) = s(y) = ur{y). Applying 
the maximum principle to the harmonic funetions u — sand v — 8, we conclude 
(since the maximum value 0 is achieved at z and y, respectively) that s = 
«= vinallof B. We thus see that u(y) = u(y) for any y € B, which shows 
that uy is harmonic in 0’. Note that in proving that u,; is harmonic we did not 
make use of any properties of the boundary of U or any continuity assumptions 
of f. 


Fig. 12.11 


Now let p € U bea touchable point of the boundary, and assume that f is 
continuous at 7. We shall show that uys(z) — f(p) as zx — p. Let the ball Be 
of radius R with center z touch U at p, that is, Bk A 0 = {p}. (Gee Fig. 12.11.) 
Since f is continuous at p, for any € > 0 we can find an BR, > R such that 


lf) —f@i<e¢€ forall geoUnBa,. (9.1) 


Define the function b by setting b(x) = ||z — 2l|?-* — R?-*.t Note that 6 is 
defined and harmonic on E* — {z} and is negative for |lz — zl: > R. Further- 
more, b(z) < K < 0 if jz —2z|| > RB. In particular, for all ge aU we have 


fo) < o@ + «+s@). (9.2) 


In fact, this is just (9.1) if |l¢ — 2|| < Ri, and it follows from |f(q)| < M if 
lig — z|| > Ry. By the maximum principle for subharmonic functions we 
conclude that w(x) < (2M /K)b(z) + €-+ f(p) for any w & Wy and any z & U. 
We thus have 


u(t) < =F bla) +f(p) +e forany z&U. 
On the other hand, we have, for the same reason as before, 


fp) —«—"Fo@ <f@)  forany g €dU, 


fd(z) = In(R/|la — 2l|) ifn = 2. 


494 POTENTIAL THEORY IN E” 12.9 


so that the harmonic function f(p) — ¢€ — (2M /K)b actually belongs to Wy. 
In particular, we have 


i= ~= 2 v2) < u(x) forany «EU. 


Putting the two inequalities together, we conclude that 


lug(z) — fp) < 27 b(x) +e forall reu. (9.3) 


Now the function 6 is continuous and vanishes at p. Thus by choosing x 
sufficiently close to p we can arrange that the right-hand side of (9.3) is less than 
2e. Thus u(x) — f(p) as « > p. 

In particular, if all points of éU are touchable, and if f is continuous at all 
points of the boundary, we have proved: 


Theorem 9.1. Let U be an open subset of E® with compact closure and 
touchable boundary. Let f be a continuous function defined on @f/. Then 
there exists a unique function wu, which is continuous on 0 and harmonic 
in U, and which coincides with f on dU’. 


Some remarks concerning Theorem 9.1 and its proof are in order, 


a) The only time we used the assumption that dU is touchable was when 
we constructed a function 6 which was subharmonic in U, vanished at p, and took 
on values less than some negative constant on all points of dU outside some 
neighborhood of p. Now a more careful analysis will show that we can always 
construct such a function so long as we can touch p from the outside by a cone. 
Thus we can solve the Dirichlet problem even at points like the p in Fig. 12.12. 


Fig. 12.32 


b) On the other hand, some condition on the boundary is necessary. Jor 
instance, if dL’ contains an isolated point p, then we cannot assign the value at p 
arbitrarily. In fact, if u is continuous in a neighborhood W of p and harmonic in 
W — {p}, then the Poisson integral formula implies that the various derivatives 
of u will also be bounded in W — {py}. Therefore, the proof of the mean-value 
theorem applies to the point p itself, and so u(p) is determined and cannot be 
assigned arbitrarily. More generally, a more delicate argument shows that 
the Dirichlet problem cannot be solved for domains whose boundaries con- 
tain spikes pointing inward or (in dim >3) sufficiently sharp cusps (as in 


12.10 BEHAVIOR NEAR THE BOUNDARY 495 


Fig. 12.13 


Fig. 12.13). This is the mathematical analogue of the physical fact that a very 
sharp conductor cannot hold a charge, but will spark. (The relation with 
electrostaties will be discussed in Section 12.) 


c) For the purpose of applying the results of Section 4, i-e., the construction 
of the Green function of the domain and the various identities, Theorem 9.1 
is still not quite enough. We need to know not only that the Green function 
exists, but also that it has continuous derivatives up to the boundary. This fact 
requires further argument and additional assumptions about the nature of the 
boundary; it will be discussed in the next section, 


2t=q Fig. 12.14 


10. BEHAVIOR NEAR THE BOUNDARY 


The purpose of this section is to discuss the behavior of the solution of the 
Dirichlet problem near the boundary. In fact, we shall prove: 


Theorem 10.1. Let U be a domain with regular boundary and compact 
closure in E”. Let f be a continuously differentiable function defined on 3V/, 
and let « be the solution of the Dirichlet problem with boundary values f. 
Then the partial derivatives du/dx‘ can be extended to continuous functions 
on U. Furthermore, if pe 6U and t= <#!,.,.,¢"> is tangent to 
the boundary at p, then 


(t, df) = (&, du) = ¥ (du/ar')&*. 


The following proof was suggested to us by Professor Ahlfors and is repro- 
duced here with his kind permission. 


Proof. Let p be some point of 8U, and let us arrange, by a Euclidean motion, 
that the tangent plane to dU! at p is the plane x” = 0. (See fig. 12.14.) Then 
near p the points of dU are all points of the form (z',... , 2" —!, e(v',..., 2777), 


496 POTENTIAL THEORY IN E” 12.10 


where ¢ is a function defined near the origin of R”—! and vanishing at the origin, 
together with all its first derivatives. In particular, there is a constant C > 1 
such that 

lp(x', tee ee | < C{(z!)? a aa a al 


in some neighborhood of the origin of R"~!. We can therefore choose a suffi- 
ciently small R < 1/2¢ so that the (open) ball of radius R with center 
{0,...,0, #) lies entirely within U and the ball of radius R with center 
(0,...,0, —R) lies inside the complement of J. (See Fig. 12.15.) 

Letz = (0,...,0, --#). Then forall ¢g € dU sufficiently close to p we have 


Ne — il? = @)P +--+ + G7)? + R? — Weg) + 99) 
> (h — 2RC){(2)? +--+ + (e373 + R? + CM), 
so that 
lla — el] — RS ("FP +--+ G4, 
where & is some positive constant. Let g(r) = r*7? (=Inr if mn = 2). Then 
¢’'(R) = 0, and therefore |y(|'g — zf) — ¢(R)} > Clllg — zl] — R] for a suit- 
able C > 0. In particular, we have the inequality 


Ta = alr=2 R=? 


> K{(2')? +--+ @ 4 (10.1) 


or 


It 


lla — 4 z al > Katy if ame 


for some positive constant K. 

Now let us replace the function f by f — [f(p) + X77! (af/ez*h pci. 
Then u — [f(p) + LT? (af/ae')(p)x"| is the solution of the corresponding 
Dirichlet problem. In this way we may assume (changing our notation accord- 
ingly) that f, together with its first derivatives, vanishes at p. Since f is assumed 
to have continuous first derivatives, we may apply Taylor's theorem to conclude 
that there exists a ¢ > O such that for ail g on dU sufficiently close to p we have 


[f(@)| < eff)? +--+ @')4}. (10.2) 
As in the last section, let 
b(z) = ||x — 2l?-* — R?-* 


= ln if m= 2. 


ae ae 

lg — all 
If we compare (10.2) with (10.1), we see that for all ¢g € dU’ sufficiently close to 
p we have 


vl < = [ball 


On the other hand, since 6 is strictly negative outside some neighborhood of p 


12.10 BEHAVIOR NEAR THE BOUNDARY 497 


Fig. 12.15 Fig. 12.16 


(on 9}, and since f is bounded on U,, we can (using the above inequality for 
all g near p) find a constant A such that 


1f(@| < Alb@{ forall geav. (10.3) 


Since the function 6 is harmonic in U, the maximum principle applied tou — Ab 
and Ab — x allows us to conclude that 


ju(x)| < Ajb(x)| forall xed. (10.4) 


Now the function 8 is a differentiable function of the distance from z whieh 
vanishes when this distance is R. In particular, there is a constant « > 0 
such that 

|b(x)| S a d(z), 


where d(x) denotes the distance from z to the sphere of radius # with center z. 
Now let B be the ball of radius R with center —z = (6,...,0, R), so that B 
lies in U, and let S = 8B. Note that S is tangent to the sphere about z at the 
point p. (See Fig. 12,16.) Thus for points y on S, d(y) < c;|ly — pif?. If we 
substitute this into (10.4) using the previous inequality, we conelude that 


ju@~)| < Lily — pi}? = forall yeS, (10.5) 


for some constant L. We apply the Poisson integral formula to the ball B and 
the funetion « and differentiate with respect to xr* to obtain 


du _ =2(x" matte) fw uly) ape er pale — 9) a9 
tc 


dzt An R yl" AniR lz — glint? 
(10.6) 


498 POTENTIAL THEORY IN E” 12.10 


Now let.z be on the normal to the boundary through p, so that = (0, ..., 0, 2”). 
If |2"| < #, then 

ly — pil < lly — zi + |z"| S [fy — ai]. 
Thus 


uly} | < or _lu@f n if 2—n 
- a8 dS < 2"D f{ liy — vi ds. 
Ie — uF Te lr 8 SP pln — 

Since the eae is {n — 1)-dimensional, this last integral converges absolutely, 
and we thus see that the first integral in (10.6) is uniformly absolutely conver- 
gent. (Note that the term containing this integral vanishes if 7 <n.) We now 
show that the sceond term in (10.6) tends to zero as x” tends to zero. In fact, 


R* — |lx | all? = 2*(2R — 2”), so we can write the second term of (10.6) as 
@R = 2")? [ Gumwe (2")"Pu@)(z" = 9) a 
ae hey lee Ya — ylle+2 — y||"+2 


This time, since x” < ||x — yl] for all y € S, we can assert that the integrand 
is smaller in absolute value than 


lu(y)| iat — y"| Jucy)| [u(ypio"t 12 n41i2 3/2—n 
ly — ae S fy af rees S Jy — pp S D2 ty — a, 
which is again absolutely integrable. Therefore, the integral occurring in the 
last expression is uniformly bounded for all values of x” > 0. Since (2”)!/? — 0, 

we conciude that the second term of (10.6) approaches zero. 

If we now rephrase the result we have obtained independently of the 
special choice of coordinates, we see that we have proved the following: Let p 
be any point of aU, and let x be a point on the normal line to 8U through p. 
If & is any vector parallel to a tangent vector at p, then <F, du) — ¢¢, df) as 
x — p along the normal line. If @/8n denotes the unit vector in the normal 
direction (pointing into U/), then 


u(y) — u(p) 
(2 sau) ef to) — wip) Too (10.7) 


where the integral is taken over the sphere of radius # tangent to dU at p and 
lying inside @. 

So far, we have proved convergence only if z — p along the normal direction. 
If we go back and examine the argument, we see that the radius & and the 
constant D in (10.5) can be chosen uniformly for all p € 6U. (In fact, since 8U 
is compact, we can cover 0U by a finite number of neighborhoods, of type (iii), 
such that a ball of radius # about each p € AU lies in one of these neighborhoods, 
where # is sufficiently small. In such a situation it is clear that the choice of R 
(still smaller, if necessary) depends only on the second derivatives of the change 
of charts frorn Euclidean coordinates to the adjusted charts. Also the choice 
of D depends only on the constants involved in the Taylor expansion of f and 
on R&, Thus the choice of 2 and D can be made uniformly.) Since, by assump- 


12.11 DIRICHLET’S PRINCIPLE 499 


tion, the partial derivatives of f are continuous and the right-hand side of (10.7) 
is continuous in p, we conclude that the limits of the partial derivatives of u 
exist as we approach the boundary in any direction, and that in fact we have 
proved Theorem 10.1. 0 


We can now construct a Green funetion for any domain with regular 
boundary by the solution of Dirichlet’s problem, as in Section 4. Theorem 10.1 
tells us that it is differentiable in the closed set U in the sense that the partial 
derivatives arc continuous up to the boundary. If we want to derive the various 
formulas of Section 4, we can do so by applying a limit argument: We simply 
replace U by a smaller domain U* such that U* — U and aU* — oU asa 0. 

(Actually, a more careful and delicate argument will show that if f has two 
continuous derivatives, then the du/dz* we obtain as limits on the boundary are 
in fact continuously differentiable. Since du/dz* is the solution of the Dirichlet 
problem with these boundary values, we conclude that the second derivatives 
of u can be extended so as to be continuous on U. In this way one can prove that 
if f is C®, all the derivatives of u can be extended so as to be continuous on U. 
In particular, all the derivatives of the Green function G(x, -) can be extended 
to continuous functions at the boundary for any zx € U.)} 


11. DIRICHLET’S PRINCIPLE 


The solution of Dirichlet’s problem has an interesting and useful characteri- 
zation as the solution of the problem of minimizing the Dirichlet integral 
Diu, u] = JX (6u/dzx")? with given boundary values. More precisely, 


Theorem 11.1 (Dirichlel’s principle). Let U be a domain with compact 
closure and regular boundary, and let f be a continuously differentiable 
function on dU’. Let « be the solution of the Dirichlet problem with 
boundary values f, and let w be any function which is differentiable in U 
and takes on the values of f on the boundary, and whose derivatives can 
be extended to be continuous in 0. Then 


Dfu, ul < Dfw, wv], (i.1) 
and equality holds if and only if = w. 


Proof. Let us write w = u-+ 2, where now »v is continuously differentiable on 
U, continuous on U, and vanishing on dU. Then 


Dw, w] = Dlu-bv,u bv] = Dlu, u] 4- Dfe, x] 4- 2D[z, 2}, (11.2) 


since D is bilinear and symmetric in its arguments. 
Now suppose for the moment that % possesses two continuous derivatives 
and v possesses one continuous derivative in a neighborhood of 7. Then 


Diu, o} = Delu, o} = . dv « du = i vx du — i pau =0. (11.3) 


500 POTENTIAL THEORY IN E” 12.12 


Now Df, »] = ipa (av/dx*)? > 0 and vanishes if and only if all the derivatives 
of v vanish, so that v is a constant on each component of U. Since v vanishes on 
éU, we conclude that D[v, vo] = 0 if and only if » = 0. 

In case w and v are not necessarily differentiable in a neighborhood of aU, 
we establish that D{u, v] = 0 by writing 


Dylu, vo] = lim Dy*[x, el, 


where U? is a sequence of domains which approach U as a > 0 and aU/* 3 aU. 
Applying the previous argument to each of the domains U*, we conclude that 
Diu, ve] = 0 and Dip, x] = 0 if and only if » = 0. This proves the theorem. U 


12. PHYSICAL APPLICATIONS 


The study of Laplace’s equation and its solutions plays an important role in 
many theories of classical physics. This is essentially due to the intimate 
connection between the Laplace operator and Euclidean geometry. Since this 
is not the place to elaborate on the physical applications, we refer the reader to 
any physics text for the details. (See, for instance, Feynman, Leighton, and 
Sands, The Feynman Lectures on Physics, Addison-Wesley, Reading, Mass., 
1964, vol. LL, especially Chapter 12.) We shall briefly mention some of the 
relevant physies in this section. 

In electrostatics it Is assumed that the electric field E satisfies the equations 


dx KE = p, dE = Q, 
where p is the density of charge and we identify the vector field # with a linear 
differential form via the Euclidean metric on E*. Here p is assumed, for the 
moment, to be a smooth density. (It 1s also convenient to consider the limiting 


cases of “surface distributions” and “point distributions”.) By the second of 
these equations, we know that F can be written as 


E= —de, 


where ¢ is a smooth function known as the potential of the electric field. The 
first equation then implies that 


Ayg=d+*tdeg = —p. 


If p is given, then ¢ is determined by the formula 


1 p(x) 


(since Ag = 47). As limiting cases, we obtain, for charge distributed along a 
surface § with density o dA, the formulas 


i a(x) 


12.12 PHYSICAL APPLICATIONS 501 


where @.4 is the area density of the surface, and 


ack" Ux) 
of) = -f Ty — | ds, 


where / ds is a linear distribution along a curve C. 
For a point charge located at z, we obtain 


oy) = = 
4xlly = || = Fig. 12.17 
where e¢ is the magnitude of the charge. 

There can be no electric field along a conductor (since this would result in a 


motion of the charge, which contradicts the assumption of a static field). Thus 
dpe = 0 and ¢ = const. 


In most problems that arise the distribution of charge is not known in 
advance, but must be determined. For example, suppose that a unit charge is 
placed at a point x inside @ cavity whose boundary is a conductor which is 
grounded, i.e., kept at zero potential. (See Fig. 12.17.) We want to find the 
electric field inside the cavity. There will be a charge distribution induced on 
the boundary surface that we also wish to determine. Since there is no charge 
distribution anywhere inside the cavity, except at x, the potential ¢ must be 
harmonic everywhere except at x, and, in fact, differs from 


oe 
4z||- — =| 


by a harmonic function. Since we want ¢ = 0 on the boundary, we see that the 
desired potential ¢ is exactly the Green function for the domain. The surface 
density can then be determined from dy along the boundary. (This is another 
reason why it is important to know that the solution of Dirichlet’s problem is 
differentiable at the boundary.) 

More frequently, we are interested in the electric field outside conducting 
surfaces. Here, strictly speaking, the theory we have developed of Dirichlet’s 
problem does not apply, since we considered only bounded domains. We can 
handle this problem in one of two ways. First, we can reduce to the Dirichlet 
problem by considering everything contained inside a very large conducting 
sphere (at potential zero). Then we let the radius go to infinity to get the 
desired potential as a limit. Second, we can modify our arguments in Sections 3 
through 10 to include the case of unbounded domains, but restrict all our fune- 
tions to vanishing as ¢/r at infinity. The details are left to the reader. 

In the theory of diffusion one assumes that material of some nature is 
flowing according to a vector field X. If the density of the material is p dV, 


502 POTENTIAL THEORY IN E” 12.12 


then the amount of material flowing per unit time through an oriented piece of 
surface 8 is given by 


- p(X,n) dA = i +pX, 


where n is the unit normal vector to the surface, dA is the element of area, and, 
we are regarding pX on the right-hand side as a linear differential form via the 
Euclidean metric. Thus the net amount of material flowing out of any region is 
d* pX = div <X, p>. Now “material” may be produced in the region at a rate 
sdV (where s is the function describing the rate of productivity of the sources 
in the region). Thus the total change of density is given by 


2 av = div <X,p> — sdV. 


If, as we shall assume, the situation is stationary, i.e., the density does not 
change, we obtain the equations 
div <X,p> = sdV. 


On the other hand, it frequently happens that the flow is given by the gradient 
of some function, i.e., 


PiaX = dN 
for some function N. Combining these equations, we obtain 
AN = s. 


In the theory of heat NV is the temperature, p,gX is the rate of flow of heat, and 
s is the density of the sources of heat. The equation pjgX = dN says that the 
rate of flow of heat, is proportional to the gradient of temperature. In the theory 
of diffusion of particles it is assumed that the rate of flow of the material is 
proportional to the change (i.e., gradient) of the density of the particles. (This 
says that the particles tend to flow from a region of higher density to a region 
of lower density.) Then V represents the density of the particles and s represents 
the rate of production of particles. 

Suppose, for example, that the boundary of a region P) is kept at some fixed 
temperature f, where f is a given function on 0) and there are no sources of heat 
inside D. Then the distribution of temperature in D is given by the function V 
which satisfies the differential equation AN = 0 and the boundary condition 
N =f on éD. In other words, the temperature is determined by the solution 
of Dirichlet’s problem. 

In the theory of steady, incompressible, irratational flow of a fluid it is assumed ” 
that a vector field X is given which represents the flow of the liquid. By the 
incompressibility it follows that d+ X = 0. It is assumed that there is no 
circulation, that is, dX = 0. Thus X = de for some fimetion ¢ which is a 
solution of Laplace’s equations. The natural boundary-value problems that arise 
in this case are different from those we have been discussing, so we simply refer 
the reader to standard works on fluid mechanics. 


12.13 PROBLEM SET: THE CALCULUS OF RESIDUES 503 
SUPPLEMENTARY EXERCISES 


1. Let Vi and V2 be veetor spaces with scalar products. A linear map 2: Vy — Vo 
is said to be conformal (or a similarity) if 


(ly, two) = elv, w), 


where ¢ is a positive constant. (Note thatife¢ = 1, then 7 is an isometry.) 

Let M1 and M2 be manifolds each with a Riemann metric. A differentiable (C”) map 
g is said to be conformal if ys. is conformal for each « GC My. Thus, to say that ¢ is 
conformal means that 


(peek, Gexk2) = e(2)(f1, £2) if bi, fe © TCM) 


for any x © M,. (Note that this time the c can depend on z.) 


a) Let g: U — R® be a conformal map, where U C R? and we use the Euclidean 
metric on U and on R?, Suppose ¢ is given by 


¢(z, y¥) = (u, v), 


where u = u(x, y) and v = v(x, y), and where (#, y) and (u,v) are rectangular 
coordinates in the plane. Show that cither the equations 


du dv du Ou 
az oy ; oy Oz 
hold or the equations 
du _ dy au _ dy 
a oy ‘ ay = ax 
hold. 
b) Conclude that if « and v are as in (a), then they are both harmonic functions. 
2, a) Let uw be a harmonic function defined on all of R*. Show that if wu is bounded 
then it must be a constant. 
b) Using (a) and Exercise 1, show that there is no conformal map of R? onto a 
bounded subset U C R?. 


13. PROBLEM SET: THE CALCULUS OF RESIDUES 


On a manifold 144 we can consider complex-valued smooth functions, differ- 
ential forms, ete. For instance, we say that the function 


f=utw 


is a complex-valued C*-function if each of the real-valued functions uv and v are 
real-valued C*-functions. Similarly, we consider ‘the complex-valued linear 
differential form 

o=o-+ 77, 


where a, € A\'(A1) are linear (real-valued) differential forms. We can then per- 
form the usual operators, remembering that 7? = —1. Thus 


fo = (ue — ur) + io + pr). 


504 POTENTIAL THEORY IN E” 12.13 


Similarly, if 
o=0,;+%, and &2=62+ its, 
then 
@) A we = (0) Age — 71 A T2) + U1 A T2 +71 A G2) 


is a complex-valued exterior two-form. 
We similarly have the operator d given by 


df = du+tdy, dw = d¢ +idr, ete. 


If X, and X2 are vector fields, we define the “complex vector field” X ; + iX, 
as that differential operator on complex-valued functions which is given by 


(X1 + 1X2) f = Xif + iXof 
= (Xu ma X gt) + a(X yo + X et). 


From now on, let. M be an open subset of R? with rectangular coordinates 
(x,y). We set 2 = x+ ty, so 
dz = dx + i dy. 
We let 
Z=2— ty, dz = dx — i dy, 


and then define the complex vector fields 8/dz and a/4z by 


We then set 
af fa af__fa 
a (2); a (2)s 


for any complex function f. 


EXERCISES 
13.1 Show that for any complex-valued smooth function f we have 
=F air F 
df = 3g @ t 53 
13.2 Show that Leibnitz’s rule holds for 0/dz and 0/02; that is, 
O(f9) _ . fag of o(f9) _ ,f dag of\ 
ed (# +0(2) an (2) +9 @ 


A function f is called Aolomerphic (or complex analytic) if 8f/82 = 0. For a holomorphic 
function f we write 
Wout OF 


~ az 


f 


12.13 PROBLEM SET: THE CALCULUS OF RESIDUES 505 


13.3 Show that dz and dz are independent in the sense that 
if adz+b dz = 0, then @ = Oandd = 0. 


13.4 Conclude that f is holomorphic if and only if df = & dz for some h. 


13.5 Show that 
d(z*) = nz*~! dz 


(on R? — {0} if n < 0) for all integers n, so that 2" is holomorphic. 


13.6 Conclude that every polynomial p given by p(z) = 36 ax# is holomorphic and 
that 


n 
pt) = dS kaye‘. 
E 
13.7 Define the function e* by setting e? = e*(cos y+ zsin y). Show that 
de? = ¢ dz, 


13.8 Show that the product of two holomorphic functions is holomorphic. 

13.9 Let f = «+ w be a holomorphic function, and consider the map sending 
(z, y) — (u,v) asa map of fC R? into R?. Show that this map is conformal and that 
its Jacobian determinant at the point (2, y) is | f’(z)[? if 2 = a+ ty. 

13.16 Let g be a holomorphic function defined on the image of M under the map of 
M C R? corresponding to f. Define g of by 


9° Sa, y) = glulz, y), ox, y}) 
[which we can write, for short, as go f(z) = g(f(z))}. Show that ge f is holomorphic 
and that 
(go f)'@) =f (F@)s'O. 
13.11 Let U be a domain with almost regular boundary in the plane. By Stokes’ 


theorem we have 
fore Le 
av fi 


for any complex-valued linear differential form y. (This is to be interpreted, as usual, 
as the equality of the real and imaginary parts of both sides.} Conclude that 


[own fanan [tana 
av v uy 82 


13.12 Show that dz A dz = 2idz A dy, so that 


_ oy | 89 =a} %. 
[ade = 21 ae n dy = 2 os 


13.18 Conclude that if f is a smooth function defined in a neighborhood of @, and 
if f is holomorphic in U, then 
[fae =0. 
au 


506 POTENTIAL THEORY IN E” 12,13 


Fig. 12.18 Fig. 12.19 


13.14 Let & = {+ ig, where (f 4) © U and g is a smooth function defined in a neigh- 
borhood of @. (See Fig, 12.18.) Show that 


wom EH [ Mas [Mara] 
F uz—é& 


2miLJau2— Ee z 


(Hint: Apply Exercise 13.11 to the function U(z)/(z¢ — £) and the domain U — B,, 
where B, is a disk of radius ¢« about ~. Then let ¢ — 0.] 


13.15 Conclude that if f is holomorphic in U, then for any ¢ € U, 


- A. f(2} 
f@) = 35 = — pe 
13.16 Let f be holomorphic in U — {a)} U---U {ax} (and Ict f be smooth in 


We {ay} U---U {as}, 
where W > U). Suppose that in a neighborhood of each a; 
fe) = Bae — a) + +--+ Bie — a)" + ei), 
where ¢; is holomorphic in the neighborhood. (See Fig. 12.19.) Show that 


ake 
20 é 


[ 0 dz = By + Biat---+ Bie. 


[Hint: Apply Exercise 13.13 to U -- UD. where D,,, are small disks around the a:, 
and let « — 0.) 


The result of Exercise 13.16 can frequently be used to evaluate definite integrals. For 
example, suppose we wish to evaluate 


2e 
f Ricos 6, sin 6) a6, 
a 


where # is some rational function of two variables. If we set z = e”, then this integral 


becomes 
. 1 on eee _1, | dz 
=f, rl bets ge —# |, 


12.18 PROBLEM SET: THE CALCULUS OF RESIDUES 507 


where D, is the unit disk. To apply Exercise 13.15, we must merely find the points a; 
and the coefficients B;,; to evaluate the integral. For instance, if a > Q, 


Qr 
er aS ij ee 
o ateos6) 2/o at+cosé ap, 2+ 2az+1 
Now 22+ 2az+ 1 = (z — ay)(z ~— ae), where a) = —a+ (a? — 1)? and ag = 
—a ~- (a? — 1)/8, Thus only a: € Di, and 


1 es | ree. 
a2@+2aze+1 a—a2\e—a1 2z—a2 : 
so the integrai is evaluated as 


= ean ae 
o atco@ Vg] 
13.17 Evaluate 


i2 
dé 
i: stuns (OO 
13.18 Evaluate 
2x 
sin? 6 
[ at beos 6 = 


Let P and @ be polynomials such that Q(z) + 0 for any real z. Suppose that 


P(2) = Gp—22"~? + @n_32*-8 + +--+ aiz+ ao 
and 
Qle) = bact + be_yz*-! +--+ bie + bo, 


where 6, ~ 0. In other words, the degree of Q is at least two more than the degree of 
P. Then P/Q is absolutely integrable and 


oO R 
PO). [eo 
ROG 7 Oe) 


-R R Fig. 12.20 


Now consider Jay {[P(z)/Q(z)] dz, where U is a semidisk of radius R. (See Fig. 12.20.) 
The integrat over 8U splits into two parts: 


PO) a 
[22 oa) * e+ fe o@ @ 


508 POTENTIAL THEORY IN E” 12.13 
where I'z is half the circle of radius R. For large values of R we have, for |lz|| = 


1 1 
1 [eal lanmals o> + laol - ers gt 


ae eae | SR cai | lal 
— R2 1 1 R2 
[be] — [baa] - 5 —+-> — [bol « 55 


P(2) a| - F af P(Re®*) al < ox 
1, 0) Q( Re) | SR 
and thus lima_seo Si rp = 0. We can then apply Exercise 13.16, since there will be only a 
finite number of zeros of Q in the upper half-plane. 


13.19 Evaluate 


PQ) 
Q(z) 


dx 
[. Gray 7%. 
13.20 Show that 
L. dz ae (2n — 2) ! 
oo (27+ 1)" ~~ 22D [in — DIP 


CHAPTER 13 
CLASSICAL MECHANICS 


In this chapter we shall present a brief introduction to the study of classical 
theoretical mechanics. We emphasize that what we are studying is a branch 
of mathematics—idealized from the mathematical considerations common to 
many problems arising in the physics of mechanical systems. We will thus 
formulate, on an axiomatic basis, a mathematical model that describes features 
that arise in the study of the equations of mechanics. The model will apply to 
most situations arising in the mechanical applications. In order to simplify the 
mathematics, we have sacrificed a certain amount of generality, and therefore 
there will be some situations in classical mechanics where our model is inadequate. 

Mechanics is devoted to the study of how a physical system evolves in time. 
The first fundamental assumption is that the system can be described (locally) . 
by a collection of continuous parameters. More precisely, we assume that the 
set of all possible “positions” or configurations of the physical system is a 
differentiable manifold M which is called the configuration space. 

For example, if the physical system consists of three particles free to move 
in space, we can describe the “position” of the system at any given instant by 
giving the positions of each of the three particles. Thus, in this case, the con- 
figuration space is 

Ee x EB? x €*, 


If we insist that no two particles be able to occupy the same position of space at 
the same time, then the configuration space M is given by 


M=Ex Ex £—S, 
where 


S = (<0, ve, 03> 2; € E® and vo, = ve or vy = vg or ve = Vg}. 


If the physical system is a rigid body (say a top) spinning about some point 
held fixed in space, then the position of the system is completely described by 
giving the position in space of three orthonormal vectors on the body (drawn 
through the fixed point, say). Since these vectors can be placed in any position 
but must remain orthonormal and cannot change orientation {from right- to 
left-handedness), we see that M is the set of all oriented orthonormal bases of E°. 
In this case M is a three-dimensional manifold. If we fix some arbitrary initial 
orthonormal basis <e,, ¢2,¢,>, then any possible position of the system is 


509 


510 CLASSICAL MECHANICS 


given by <2 1, v2, 93>, where »; = Ae; and A is a rotation, that is, A is an 
orthogonal linear transformation with positive determinant. Thus Jf is diffeo- 
morphic to 01(8), the space of all orthogonal three-by-three matrices with 
positive determinant. 

The basic problem of mechanics is to describe how the configuration of the 
system changes in time. As the system evolves, the configuration at any instant 
t will be given by some point C(t) € M. Our problem is to give 4 reasonably 
simple description of the possible curves € which can arise as actual changes 
in the configuration of the mechanical system. 

The second fundamental assumption of classical mechanics is that the 
curve C(é} can be determined from a knowledge of the “state” of the system 
at any given time. That is, we may (and, in general, will) have to know more 
about the system at a given time than its configuration in order to be able to 
predict its future configuration. However, if we do have enough such instan- 
taneous information, we can determine the curve C(t). The total amount of 
relevant information is called the state of the system. It is assumed that the set 
of all states is itself a differentiable manifold, $8. Since the state of a system con- 
tains more information than the configuration, we can assign to every state s the 
configuration w(s) € M of the system in the state s. In other words, we have a 
map 7:8 — Mf. We assume that this map 7 is differentiable. 

It is assumed that if we know the state s of the system at time fp, we can 
predict its state at any future time 7. Thus we are given a map 


¥t,t9:5 - § 


such that if the system is in the state s at time éo, it will be in the state », .,(s) 
at time ¢. Now there is nothing special about the time fy. If? > ¢; > fo and s 
is the state at time to, then ¢,,2,(s) is the state at time ?,, and therefore 


t,t, (4; .t9(8)) 
is the state at time é. 
In other words, 


Pt o Ptr.tg = Pita 


Now it turns out that in most (basic) mechanical systems, if the time is suitably 
parametrized, the function ¢,,., really depends only on ¢ — to. (This fails to 
hold in the so-called nonconservative systems. A typical nonconservative 
system is one involving friction. This is usually a consequence of not studying 
a sufficiently complete system. In the case of friction, for example, the heat loss 
must be taken into account.) Let us assume that ¢,,, depends only on t — p. 
Then, if we write 


81 = ty — lt and So = t— hh, 
then the previous equation can be written as 


Peo 29s, = Payt+so- 


13.1 THE TANGENT AND COTANGENT BUNDLES 511 


This looks like the defining relation for a one-parameter group except that so 
far we have been restricting ourselves to nonnegative values of s. In point of 
fact, it is assumed that this restriction is unnecessary and, in fact, we are given 
a flow ¢ on 8. 

To repeat, we are given a differentiable manifold M representing the set of 
all possible configurations of our system, a differentiable manifold $ representing 
all possible states of the system, a differentiable map 7: $ — M, and a flow ¢ on 
S. Then the curves C{é) are all assumed to be of the form 


Ci) = rere) forsome zx & §. 


We must therefore describe 8, 7, and ¢ for any given configuration space M. 
Classical mechanics makes a very definite assumption about the nature of the 
space $ (and the map 7). It asserts that the state of a system is completely 
determined by its (configuration and its} “momentum”. What is the momentum 
of our abstract setup? As usual, the momentum should be something that resists 
change in velocity. It turns out that an appropriate object representing “infinitesi- 
mal resistance to change in istantaneous velocity” is a cotangent vector. A 
heuristic motivation for this, which the reader may choose to ignore, is the fol- 
lowing: At any given configuration « € M, the set of all possible velocity vectors 
is just 7,(Af). At any given »v € T,(M), a “resistance to change in v” would be 
some function defined near v and vanishing at v. To first order we could replace 
such a function by its differential. Thus “infinitesimal resistance” is a linear 
funetion on T,(7,(M)). Since T,(Af) is a vector space, we may identify 
T,(T-(M)) with T,(M) for each ». Thus all possible “momenta” become identi- 
fied with all elements of T*(M). 

The set S is thus taken to be the set of all momenta, ic., the set of all 
cotangent vectors at all points of @M. We must first show how to make this 
space into a differentiable manifold. 


l. THE TANGENT AND COTANGENT BUNDLES 


Let M be a differentiable manifold, and let us consider the set 7{A) of all 
tangent vectors to all points of M7. Thus 


TM) = U 7,(M). 
ze Mm 


Let xz denote the map of T(M) onto M which assigns to each tangent vector the 
point of Jf at which it is defined; that is, if §€ T,.(AN), then w(t) = x We 
claim that T(Jf) ean be made into a differentiable manifold in a natural way, 
so that 7 is a differentiable map. In fact, let @ be an atlas on Mf. For any 
(U, a) € &, define T(a): -1(U) — V @ V by setting 


Tali = <a(z), fe> if ¢E7,{M),ceU. (1.1) 
We claim that the collection (r—1U, T(a)) is an atlas on T(M). To check that 


§12 CLASSICAL MECHANICS 13.1 


Ta) is one-to-one, we observe that if § € T,(M@) and 4 & T,(Af) with x # y, 
then a(z) # a(y); while if  ~ y and both lic in T,(14), we have & * ng. 
To check that the transition law is satisfied, we note that Eq. (4.3), Chapter 9 
implies that if (U, a) and (W, 8) are two charts of @, 

T(8) 2 T(a)—"(, ta) = <fo a—"(v), Fpoa1(0) a > (1.2) 
for <v, > € T(r (UN W), that is, » Ea(U NW) and £, arbitrary 
in Y. 

The fact that the structure on T(Af} docs not depend on the choice of @ 
is obvious. That 7 is differentiable is also clear; in fact, (w~'(U), T(a}) and 
(U, «) are compatible charts in terms of which a ¢ xo T(a)~'(e, 2) = v. We 
call T(M), together with its structure as a differentiable manifold, the tangent 
bundle of M. 

If M is finite-dimensional, let (U,a} be a chart of Mf with coordinates 
g',...,2", a0 that 

a(z) = <z)(z),...,2%(z)> ER" 


We will denote the coordinates associated to the chart T(a) on a—'(U) by 
<q',...,@",¢',...,¢">. Hence if ¢ € T,(M), where x € U, we have 


T(a)(é) = <e(z), fa> 
= <¢q'(t),..-,9"(8), 9"), aO>, 
so that 
favs wi so zre(2), 


Oxt fy 


(1.3) 


In other words, the q’ are just the coordinates x’ regarded as functions on T'(M) 
via w, and the q*(£) are the components of é relative to the basis 


(i). (&)f. 


We can follow a similar procedure for 
T(M)= U Tr, 
zEM 


which we call the cotangent bundle. If (U,«) is a chart of an atlas @ of MU, 
define the map 
T*(a):2— (U0) 3 V @V* 
by setting 
T*(a)() = <a(z), i> (1.4) 


for 1 € T(M) and x € U. As before, this defines an atlas on 7*(M) which, 
in turn, defines a differentiable structure on T*(Jf) which is independent of the 
choice of atlas of M. (Note that we have used the same letter, 7, to denote the 
two projections: that of T(14} — M and that of T*(/f) — Af. Whenever there 
is any confusion, we shall denote these maps by wr and a+.) 


13.2 EQUATIONS OF VARIATION 513 
If M is finite-dimensional and (U, «) is a chart on the coordinates z',..., 2”, 
we shall denote the coordinates of (w7'(U), T'*(a)) by 
| OE a eI (ee 
Thus, if? & T¥(Af), where x € U, we have 
T*(a)i= <a(z), b> 
= <q'(),---, 9°, 7'O,--., 2"O>; 


g=qemr and l= Z pide’). (1.5) 


so that 


2. EQUATIONS OF VARIATION 


Let M, and Mz be two differentiable manifolds, and let ¢: M, — M, bea differ- 
entiable map. Then ¢ induces a map T(y) of T7(M,} — T(Mz) when we set 


T)é= vaclt) if §€ Tp(My). (2.1) 


To check that T(¢) is differentiable, let us choose compatible charts (U, «) on 
M, and (W, 8) on Mz. Then it follows from the definition of T(¢) that 


(T(U), T(e)) and = (T(W), T(8)) 
are compatible charts. Furthermore, 
T(B) o Te) ¢ T(a}—" (a1, 02) = < (Be pe a!)01, Ipepoa M01) 02>. (2.2) 


This establishes that 7(¢) is differentiable. Observe that if ¥: Mz — Mg is a 
second differentiable map, then it follows from Eq. (4.4) of Chapter 9 that 


T(¥°e v) = TH) ° Ty). (2.3) 
Furthermore, it is clear from the definition that 
we P(e) = pom. (2.4) 


In other words, the diagram 


T 
(M,)— + Ti Mp) 


| | 


M, My 


commutes in the sense that it doesn’t matter which path one takes to get from 
TM 1) to Mf 2° 

In particular, if {¢~,} is a one-parameter group of diffeomorphisms of 44, we 
get a one-parameter group 7'(¢,) on T(Af), where, by (2.4), 


mo T(e)t = ve(r(2)). (2.5) 


514 CLASSICAL MECHANICS 13.2 
If X is the infinitesimal generator of {g,}, let us denote by 7'(X) the infini- 

tesimal generator of {T(¢,)}. It follows from (2.5) that if £ © T,(M), then 
ae(T(X)e) = Xz. (2.6) 


Let us obtain the expression for 7’(X) in terms of a chart: Let (L/, a) be a 
chart, and choose an open set W such that WC Uande > Osuch that y, (W)C U 
for |t] < ¢. Then 


T(a) o They) T(a)1<v,w> = <(a0 gy 0 a yx, Fiacpea-1)(v)w> , 


so that, differentiating with respect to ¢ at t= 0, we see by Eq. (4.9) of Chapter 9 
that 
P(X) ra)<v wr = <Xalv), AXacv(w) >. (2.7) 


If Af is finite-dimensional, z!,..., 2" are the coordinates of (U, a), and 
Xe = CR ge ie 


then we can rewrite (2.7) as 


a> oh n 
TX 2a. = <x, PS 0 pe ee a >, 


ox? Ox? 

(2.7") 
where w= <w!,...,w™>. In other words, in terms of the local coordinates 
<q',...,9"%, G@',...,4@"> the differential equation corresponding to T(X) 
take the form 

dq’ rae ty, l n 
Wome GW) (2.8) 
and 
dg _ aX" ax’ : 
dt xt aes ae aun f (2.9) 


Note that Eqs. (2.8) are just the local equations corresponding to the veetor field 
X. Since g* = x* o 7, this is simply another way of writing (2.6). Suppose that 


ez) = <2(),...,2"()> 
is a solution curve of X lying in U’. Then we can regard the coefficients 
axX* ax" 


(x! Oy. z(t) 


ex) Oxi 
as functions of ¢ alone. Thus (2.9) takes the form 


dw 
of a linear differential equation for w. This linear differential cquation is called 


the equation of variation of X along the curve ¢(x}. Roughly spcaking, it repre- 


13.3 THE FUNDAMENTAL LINEAR DIFFERENTIAL FORM ON T*(M) 515 


sents, to linear approximation, how solution curves of X near ¢(x) are deviating 
from g(x). 

We now go through a similar construction for the cotangent bundle. If 
eo: M, — Mz is a differentiable map, then gf: T§,2)(M2) > TF(M)) is going in 
the wrong direction. We therefore restrict our attention to maps which are 
locally diffeomorphisms, and we define 7'*(v) by setting 


T*(e)l = (eA) = (va) tif, Le TZ). (2.10) 
Since ¢ is a diffeomorphism, 
(e*)71: TEM) > Toey(Mo) 


is well defined, and we have 
wo fe) = gon. (2.11) 


If (U, a) and (W, 8) are compatible charts on M, and Mg, then 
T*(8) © T*(g) 0 T*(a)—! <4, 02> = <(Bo ge a1), Fpegoa~! Wi)]*7 vg >, 


(2,12) 
and so 7'*(¢) is differentiable. 
If y: 44, — Me and ¥: Mz, — AM are diffeomorphisms, then 
T*(po vl = (Wo g/t = (pt oy*)1U = YX! o gt “1, 
so that 
T*(p ov) = T*) o T*(y). (2.13) 


In particular, if {¢,} is a flow on a manifold Af, we obtain a flow T*¥(¢,) 
on T*(M). It satisfies 
wo T*(o)i = v(a())). (2,14) 


If X is the infmitesimal generator of {y;}, we denote by 7'*(X) the infinitesimal 
generator of {7'*(,)}. It satisfies 


mel (X)i =X, for le TEM). (2.15) 


3. THE FUNDAMENTAL LINEAR DIFFERENTIAL FORM ON T*(A4) 


Before returning to our study of mechanies, we study in some detail the geometry 
of the cotangent bundle. Let Jf be a differentiable manifold, and let z be a point 
of T*{Mf). Let & be a tangent vector to T*(M) at the point z, so that 


re T,(T*(M)). 


Then 7+é is an element of T,,,,(M). Since z € T%,.)(.M) is a linear function on 
T xz)(Af), we can consider the expression 


(ws é 2 > 
which depends linearly on &. We denote this linear function of ¢ by 6@,. We have 


$16 CLASSICAL MECHANICS 13.3 


thus defined a linear differential form @ on T*(M) by setting 
(£92) = (Weeg,2) for €e TAT*(M)). (3.1) 


The form 6 is called the fundamental linear form of T*(M). Let us obtain the 
expression for @ in terms of a chart (7 —'(U), T*(a)). Since T*(a) maps —'(U)} 
onto an open set, 0 of ¥ @ V*, the expression @7+;4, should be a function from 
O to (V @ V*)}* which can be identified with V* @ V. Let us evaluate this 
function. In terms of the chart (U, «) on M and (r7(U), T*(a)) on T*(M), 
we have 


Epa) = <v,wt> EV OV* and (Teba = V EV, 
so that 
(tx, z) = (v, 2a). 
Thus (<v, w*>, (62) r*ia)) = (v, 2); that is, 
(@:) 7a = <ta,0> €V* OV = (VV @V*)*. 

In other words, the local expression 6, for the differential form 6 in terms of 
the chart 7*(a) is given by 

6X4}, > = uz, 0>. (3.2) 


If M is finite-dimensional and we use the local coordinates <g',...,¢", 


p',..-,p">, then 


while 
ay = j i 
Wx (4). = 0 and = = p () dxi. 
Thus 
a: Oy se, >) = 
(2), > = é ai? 2s P'(2) drz ) = p'2), 
while 
0 ole 
(2), s.) =e 
so that 
6. = L p' (2) dg: 
or 


6 = ¥ pidg’. (3.3) 


Of course, (3.3) is just a way of writing (3.2) in terms of a basis of V = R”. 
Let ¢: M, — Mz be a diffeomorphism, let 6, be the fundamental linear form 
on T*(M,), and let 9, be the fundamental linear form on T*(142). Let 


te T,(T*(M,)) and wz) = 2 My. 


13.4 THE FUNDAMENTAL EXTERIOR TWO-FORM ON P*() 517 


Then 


and so 


Tal *(p)aE = gael, 


(T*(e) &, beracyye) = (ar xé, T*(y)z) 
= (GuzTaé, (ep?) 
= (sé, 2) 
=, ct, 613). 
(T*(g)) *02 = 84. (3.4) 
In particular, if {y,} is a flow on M, then 
(T*(y:))*0 = 8. 


If X is a vector field on M, then the infinitesimal version of the last equation is 


In other words, 


Doyaxy6 = 0, (3.5) 
Note that any vector field X on M defines a function fy on T*(M) by 
fx@) = (Xz,2) if we) =a. (3.6) 
We also have 
fx = (T*(X), 6), (3.7) 


since (7*(X),, 0.) = (reT*(X),, 2) = (X,, 2) by (2.15). 
Finally, in view of (3.5) and Eq. (6.14), Chapter 11, we have 


0 = Drux,@ = d(T*{X), 6) + T*(X) I dé, 
so that 
dfx = —T*(X) Ide. (3.8) 


For reasons which will become clear later on, the function fx is sometimes called 
the momentum function associated to the vector field X. 


4. THE FUNDAMENTAL EXTERIOR TWO-FORM ON T*(M) 
It turns out that the exterior two-form 
2 = dé (4.1) 


plays a fundamental role in mechanics, and we therefore study some of its 
properties. First of all, since d? = 0, we have 


dQ = 0. (4.2) 


We claim that 9, isa nonsingular bilinear form on 7,(7*(Af)) for each 
2€T*(M). That is, 


if ¢¢€7,{T*(M)) issuchthat £412, = 0, then ¢= 0. (4.3) 


518 CLASSICAL MECHANICS 13.4 


For this purpose we compute the local expression for @ in terms of 3, chart 
(x—'(U), T*(a)) [where z € x—'(U)]. The map 7*(a) gives a diffeomorphism 
of w~'(U) with a subset of V @ V*, and by (3.4) the form @ on M carries over 
to the corresponding form 6, on ¥ @ V* = T*(V). Also, 6, can be regarded as 
a({V* @® V)-valued function which is given by (3.2). Let us denote t+.) by &, 
and let 

f. = <X1,X2> EV G V* 


be considered a constant vector field on VY @ V*, Then 
(Ears Px)<uerus> = (X1, uz), 
and so &(£,, @q) is the Hinear differential form given by 
{nas Uke; 90)) = (Xi, Ve) if ne = <¥1, Yer EV OV* 


On the other hand, since & is a constant vector field, the Lie derivative Dz, 6. 
reduces to the ordinary derivative of the linear function <9, ug > ++ <uZ,O>. 
This derivative is Just the constant <X2,0>. Thus the Lie derivative Dz, @, 
is given by the constant linear form 


Dy fn = <Xo,0> EV* BV; 
that is, 
(na; Dy Ba) = {¥1, X2). 


Now § 12, = & Id0q = Dz,8. — d{ta; 0a), so we see that & JQq is the 


constant form 
& 1d@= <X2,—X\> EV*@ V. 


To recapitulate, if ¢¢ T,(T*(M)) is such that tra) is given by Erajq) = 
<X,,X2> EV ® V*, then 


(E IQ)reay = <X_g,—-X,> EV*O V. (4.4) 


Equation (4.3) elearly follows from this. 

Since (4.3) is of fundamental importance, we present an alternative deriva- 
tion for the case of a finite-dimensional manifold. If we use the coordinates 
<qi,...,q%, pi,...,p™>, then 


6= D p' dg’, 
so that 
Q = > dp' A dq’. (4.5) 
If 
79 ia 
x=5(4 5g + B Spt) 
then 


X I2= ¥ (Bt dg’ — A‘ dp’). (4.6) 


Thus (X _1Q); = 0 if and only if X, = 0, This shows that on 7*({Af) we 
have a one-to-one correspondence, X ++ X JQ, between vector fields and 


13.4 THE FUNDAMENTAL EXTERIOR TWO-FORM ON 7*(4#) 519 


linear differential forms. Let us denote by wy the form corresponding to X, 
so that 
wx = X _1Q; 


and let us denote by X, the vector field corresponding to the linear differential 
form w. Thus 
w= X, JQ = wx,,. 
Observe that 
dw = 0 ifandonlyif Dy,2=0. (4.7) 


In fact, by (4.2), 
Dy, 2 = dX, 19) = dw. 


In particular, there is a distinguished class of vector fields on 7'*(44)—those 
corresponding to functions, 1.e., the vector fields of the form Xa, where F is a 
function on T*(4). These vector fields are called Hamilientan vector fields. 


*In view of (4.6), if DyQ = 0, then fecalfy, at least, X is of the form Xgr, 
since any form @ with dw = 0 can be written locally as @ = dF. If we make the 
topological assumption that T*(/f) has vanishing first cohomology group, then 
DyQ = 0 is equivalent to X: being Hamiltonian. This assumption is really a 
restriction on the nature of the configuration space Af. Since we do not wish to 
restrict M in this manner, we will not take Dy? = 0 as the definition of a 
Hamiltonian vector field.« 


Note that if X and ¥ are Hamiltonian vector fields, so is aX + bY, where a 
and } are constants. In fact, if X = Xap and ¥Y = Xag, then 


aX + bY = Xaaer+og). 
Furthermore, [X, Y] is also a Hamiltonian vector field. In fact, 


Dx(¥ 30) = DxyY¥Y 194+ Y 1 Dxa 
DxY 19, 


since DxQ = @. Since Dx ¥ = [X, Y], we see that 
[X, YF] 12 = Dy dG = dDxG. 


In other words, we have 


Il 


[Xar, Xae] = Xacxypc)- (4,8) 


We thus see that we have a binary operation on functions corresponding to the 
Lie bracket on Hamiltonian vector fields. It is called the Poisson bracket and is 
denoted by {F, G}. In other words, we define 


{F, G} = XarG, (4.9) 
so that we can rewrite (4.8) as 
{Xar, Xac] = Xair.ay- (4.8’) 


520 CLASSICAL MECHANICS 13.5 


Note that 
XarG = (Xar, dG) = (Xap, Xag JQ) = (Xag A Xap, Q), (4.10) 


so that, in particular, 


{F, G} = —{G, F}. (4.11) 
In terms of local coordinates <q!,..., 9", p',..., p” >, we have 
§ (agi 4 OF ani 
so that by (4.4), 
aF 8 oF a 
nee (= api opt in) no 
and therefore 
_ 7 (af 2G _ aF ac 
{F,G} = > (= Spi 3p ae). (4.13) 


from which the antisymmetry (4.11) is apparent. A consequence of (4.11) is 
the following: 


Proposition 4.1. If F and G are functions on T*(Af) such that 


XarG = 0, 
then 
Xiek = 0. 


In other words, if G is constant along the solution curves of Xyp, then F is 
constant along the solution curves of X 4g. 


Tn fact, 
XarG = {F,G} = —{G, F} = —XaoF. 


Tt will turn out that Proposition 4.1 is the prototype of all the “conservation 
laws” of meehanies. 

We close this section with the following observation. Let Y be a veetor 
field on AY. Then the momentum function of Y is a function fy on 7'*(J4). 
Equation (3.8) asserts that 


—T*Y) = Xap, (4.14) 


5. HAMILTONIAN MECHANICS 


As we indicated in the introduction, the first fundamental assumption of mech- 
anies is that the evolution of the system is determined by a flow on T*(Af), 
where Mf is the configuration space of the system. The second fundamental 
assumption concerns the character of the flow. It is assumed that the infinitesi- 
mal generator of the flow is a Hamiltonian vector field. That is, it is assumed 
that there is a function H (called the energy) on T*(AZ) such that the vector 
field X_ux is the infinitesimal generator of the flow on T*(M) describing the 


13.5 HAMILTONIAN MECHANICS 521 


evolution of the system. (The minus sign is a consequence of certain standard 
conventions.) In order to see what this means, let us express the equations of 
motion in terms of g- and p-coordinates for a finite-dimensional system. Thus 


OH @ oH 2 
Xan = 2 au 2-H So). 


Thus, if <q'(-),...,¢"-), p'¢),...,p"C-)> is an integral curve of the flow, 
it must satisfy the differential equations 
dq’ _ 0H dp’ _ aH 
dt Op nd di ag (5.1) 
A trivial consequence of (4.11) is that 
X_azH = 0, 


In other words, the function H is constant along trajectories of the system. 
This principle is known as the law of conservation of energy. 
More generally, we can formulate Proposition 4.1 as: 


Proposition 5.1. Let X_ex be the infinitesimal generator of a mechanical 
system with energy H. Let F be a function such that 


XarH = 0, 


Then F is constant on the trajectories, i.e., solution curves, of the flow 
generated by X_ay. 


Proposition 5.1 is the prototype of all the “momentum conservation” laws 
we shall derive later; see, for example, the discussion at the beginning of Section 6. 

In order to specify the mechanical system, one must give the function H. 
It turns out (in many but not all] cases) that the energy is the sum of two terms, 
H = K + U, where K is called the kinetic energy and U is called the potential 
energy. They each have a very special form which we now describe. 

The kinetic energy is a function on 7'*({A¢) which is associated with a 
Riemann metric on Af, Let ( , ) be a Riemann metric on M. It gives a scalar 
product ( , ), on each 7',(14), and therefore induces an isomorphism of T,(]f) 
with 7*(44), and thus gives a scalar product on 7'%(/f) which we will continue 
to denote by ( , ). The function K is then defined by 


K@® = 20, D. (5.2) 


To understand the relevance of 2 Riemann metric to mechanics, let us consider the 
most elementary case —the study of a single particle of mass m in E4. The usual relation 
between velocity and momentum, 


p= mq 
can be formulated as follows: Consider the Riemann metric on E* which is 
m X the Euclidean metric. That is, if <x, y,z> are rectangular coordinates on 


522 CLASSICAL MECHANICS 13.5 


E* and <q., dy: Ge Gz: dy Ge> are the corresponding coordinates on T(E), 
then 

er dys Ge)|I? = maz + mgs + mdz. 
Then the map of 7,(E*) — T*(E*) sends 


<Grs Ga Fer Gr Gay Fe X Ger Gy Ges Pr Pus Pe? 5 
where 
Pr = Nedz, Py = My, — Pe = TH, (5.3) 
aud 


: 
K Ges Qu: Ge: Pa Pus Pz) = 50 (Pe + Py + PD. (5.4) 


Thus the passage from velocity to momentum depends on the choice of a 
Riemann metric, which determines a map from T(A7) > T*(AF). This can be 
regarded as a generalized “choice of mass” for the configuration. 

The funetion U is assumed to be of the form U = U oa, where U is a 
function on M. The form F = — df is called the force field whose potential is 7. 
It can be regarded as a vector field on A¥ in view of the Riemann metric on M: 


(¢,F,) = —(¢,dU)  forany £€7,(M). 


In the special case of a single particle in E* with mass m, substituting (5.3) 
and (5.4) into Eq. (5.1) gives, when we write H = K+ U, 


¢ 


dmgi dp’ _ 
7 and 7 Tah F (5.5) 
or, since m is constant, 
dg’ __ Ff x 
m We =F ’ (5.6) 


which is the usual rule stating that force = mass x acceleration. 
Returning to the general theory, we now formulate a useful corollary of 
Proposition 5.1. 


Proposition 5.2. Let H = K+ U be the energy associated with the 
Riemann metric ( , ) and the funetion 7 on M. Let Y bea vector field on 
M which is an infinitesimal isometry of ( , ) and is such that YU = 0. 
Then the momentum function fy is constant under the flow generated 
by X_an. 


Let {g;} be the flow generated by FY. Since ¢; is a local isometry, (y*l, of) = 
(l, 2) wherever ¢fl is defined, so that 


K(T*(e)l) = KO, 


and thus 
THYK = 0. 


13.6 THE CENTRAL-FORCE PROBLEM 523 


Also, 
T*(Y)U = (T*(Y), dV) 
= (T*(Y), r* dU) = (a4T*(¥), dU) = (Y, dU) = 0, 


so that T*(Y)H = 0 and Proposition 5.1 applies [see Eq. (4.14)]. 


6. THE CENTRAL-FORCE PROBLEM 


Before proceeding with the general theory, we illustrate the previous results in 
a simple but important case. We will study the motion of a particle in E* acted 
on by a “force centered at the origin”. That is, our configuration space M will 
be taken to be E? — {0}, and the Riemann metric is m x the Euclidean metric, 
where mm is the “mass” of the particle. We also assume that the function 7 
depends only on the distance to the origin. Thus U(z, y,z) = P(r), where 
r? = 774+ y?4-27, Under these circumstances it is clear that any rotation 
about the origin is an isometry of the Riemann metric and preserves 7. We 
ean therefore apply Proposition 5.2 to the infinitesimal rotations z(0/dy) — 
y(8/dx) to conclude that the corresponding momentum function xp, — ype is 
constant. (This momentum function is known as the angular momentum about 
the z-axis.) Similarly, the functions zp, — zp, and yp, — zp, must be constant 
on any trajectory of the flow. If we writex = <x, y,z> and p= <Dz, Py, Pz>, 
then these three conservation laws can be combined to read 


x A p= const, (6.1) 


which is known as the law of conservation of angular momentum. Here x and p 
are considered vector-valued functions on 7*(), In order to study the impli- 
cations of (6.1), let us first distinguish two cases: where the constant ~ 0 and 
where the constant vanishes, If the constant occurring in (6.1) does not vanish, 
then (6.1) implies that the plane spanned by x and p does not change. In 
particular, the motion is such that x lies m a fixed plane. If x A p= 0, we 
argue as follows: 
Since p = mx, we have x A x = 0. Since x ~ 0, this implies that 


x = \x 
for some function of time. Now ||xi{ = (x, x)1?, so 
d I x : 
pea ————— x)= —- x = i 
di ||! {ix|f (x, x) (3 , :) Alix, (6.2) 


and therefore 


dfx _ de _ Aisi _ 
Ay, xl dal? 


so that x/||x|| = const; that is, x lies on a fixed ray through the origin. 


524 CLASSICAL MECHANICS 13.6 


If we differentiate (6.2) and make use of the fact that x/||x|| is constant, we 
get, by (5.6), 


a oo ee Ve a x 
my llxll = (pms :) = Fy » P’sll aI 


et 
de — 


that: is, 
—P*{r), (6.3) 


where + = ||x||. 

In any event, in all cases the particle x moves in a plane. We may therefore 
restrict our attention to the plane. That is, we can consider a “new” mechanical 
system where 17 = FE? — {0} and its Riemann metrie, the function 0 = P(r), 
etc. Let us introduce polar “coordinates” in the plane. Then 0/d@ preserves the 
metric and the potential. If <1, @,,, p¢> are the corresponding coordinates 
on T*(M), then Proposition 5.2 imples that 


Pe = const. (6.4) 


(Note that this is really just that part of (6.1L) that we haven’t yet used.) Now 
in terms of polar coordinates, the Fuclidean metric has the form of Eq. (8.7} 
of Chapter 9, so that the Riemann metric associated to the mass m which is 
m X the Fuclidean metric is given by 


iG, Op? = mF? + 126}. 
In particular, the associated map from 7(Af) to T'*(Af) is given by 


pp= me and ~~ pe = mr. (6.5) 
Thus (6.4) says that 
r?# = const. (6.4") 


To understand the significance of (6.4’}, consider a curve x(-} in the plane. 
Consider the region U% bounded by the portion of the ray from 0 to x(O), the 
portion of the ray from 0 to x(é), and the curve x(-) from 0 to ¢. (See Fig. 13.1.) 


Then 
xt} 
L 2 = 

afr do= f rar a do 


is the area of U. On the other hand, since d@ = 0 
on the rays, we see that x(0) 


to. 
[ r? de = f 76 dl. , Fig. 13.1 
au 0 


Thus (6.4’) is exactly the content of Kepler’s second law: The particle sweeps 
out area al a constant raie. 

Thus we have seen that the spherical symmetry of the system implies that 
¢he motion is in a plane and that Kepler’s second law holds. 


13.6 THE CENTRAL-FORCE PROBLEM 525 


We have not yet made use of the conservation of energy, which in the 
present context reads 
1 
2mr2 


di? + fmr?6? + P(r) = const = fw? + rh + PO). (66) 
Let us examine a particular solution curve for which pp = A = const. Then 
differentiating (6.6) gives 


dp; dy A* 


dt de 

We can interpret (6.7) as the equations of motion of a one-dimensional 

mechanical system. The kinetic energy of this system is 4m*?, and the potential 
energy Q is given by 


— P(r). (6.7) 


2 
Q(r) = PY) + ae (6.8) 


The second term in (6.8) is known as the centrifugal potential and the correspond- 
ing term, A*/mr?, occurring in (6.7) is called the centrifugal force. Note that if 
A = 0, then (6.7) reduces to (6.3), which is what we would expect. 


Fig. 13.2 


We can now use the following procedure for solving the equations of motion: 
First, for a given angular momentum find the various solution curves r(-) of (6.7). 
For a solution +(-), determine @(-) by integrating 6 = A/mr? to get 


t 
A 
a(t) = [ mr2(s) ds + const. (6.9) 

We can obtain a good bit of information about the nature of the solutions 
of (6.7) by using the law of conservation of energy. We draw a graph of the 
function, as shown in Fig. 13.2. Suppose that the trajectory r(-) has the con- 
stant energy Ey = $m#?+Q(r). Then if the set {r:Q(r) < E,} is bounded, 
r(t) must always be in this set, since 4m*#? > 0. In fact, suppose that the 
interval @ < r < b is such that Q(@) = Q(b) = E, and Qib+ 6 > EF, and 
Q(a — €) > EF, for small ¢. Then if a < r(fy) < b for some value of to, it follows 


526 CLASSICAL MECHANICS 13.6 


that a < rf) < db forall & Furthermore, if Q@(r) < EF, fora < r < 8, then for 
a < r(to) < b, #(f) cannot vanish. The particle is thus moving to one of the 
limits » = aandy = b. If we use the law of conservation of energy, we see that, 


mr? + Q(t) = Ey, 
so that 


Fora < + < 6 we see that r is a monotone function of ¢, and we can solve to 
obtain ¢ as a funetion of r by the formula 


: ax : 
t(r) = (dm)? | ——-—- + % if rl) =79. (6.10) 
hte VE, — OG) i 
In particular, if Q’(a) = 0 and Q’(b) = 0, the integral in (6.10) converges, so 
that it takes a finite amount of time for r to get to @ or to b. Thus (since 
d/dé = Q'(7} ~ 0), the function r oscillates between @ and 6 taking the time 


b 
T = (am)!2 il ae Se 
s VE, — Q(z) 
to get from one side to the other. 

If {r:Q(r) < EH} is not bounded, then the motion with energy Ey need 
not be bounded. Tor instanee, suppose that @(r) < EH, for r > a. Then if 
r(to) > aand f(fg) < 0, 7 will decrease until it hits a at which time it will turn 
around and then go off to infinity; if #{é9) > 0, then r will simply go to infinity. 
The trajectory in this case is that of r coming in from infinity, turning around 
at a, and then going back to infinity. 

We recall that this is a descriptive analysis of the function r(-). The curve 
6{-) is then to be determined by (6.9). 

Ifrrequently we are interested in the tajectories as curves in the 7é-plane 
without reference to the time dependence. Suppose that the energy is # and the 
angular momentum is 4. l’or the range where r + 0 we can substitute 

dé_oA 
dt mr? 


inte (6.10) to obtain [since dé/dr = (dé/di)(dt/dr)] 


1\? ("Adz 
ar) = + &) i 2JE—O@) Ta) + Oro). 


In the important case where P(y) = —a/r corresponding to the inverse-square 
law of gravitational attraction, we have 
a a 
Q(x) = at 


2Qmx2 x 


13.6 THE CENTRAL-FORCE PROBLEM 527 


so that the integrand is 
1 A A 


(2m)t/2 ; Az - : a ana? - quiets 
Sees tz [2m — 4 _ me) + | 
A _ ma 
= £ larecos — a 
dx omE +. Mea 
mE + A? 
Thus 
A _ ma 
O(r) — 679) = arecos — . a 
mia? 
2mF + A2 


Let us choose rg to be the minimum value of +, and let us choose 6(7'9) = 0. 


Let us set 
A? 2h A2 
p= and e= V! + Pe 


We see that the equations for the orbit are 


g =1+.ecos 8. (6.11) 


This is the equation of a conic section (Kepler’s first law). If # < 0, then 
é < 1 and (6.11) represents an ellipse, whose major and minor scmiaxcs are 


and 


We leave to the readers the details of working out the hyperbolic and parabolic 
orbits. 

For the elliptic orbit, Kepler’s second law implies that the total area of the 
interior of the ellipse is swept out at the uniform rate 4/2m. Thus the arca, 
mab, of the ellipse is given by (A/2m)T. Thus 


A A 
~— Tf = rab = ra —— >» 
2m V2mi BE] 
and hence 
T = Qa*!*/m/a. 


In other words, the square of the period of motion is proportional to the cube 
of the linear dimension of the orbit (Kepler’s third law). 


§28 CLASSICAL MECHANICS 13.7 


7. THE TWO-BODY PROBLEM 


Let us consider the mechanical] system consisting of » particles each having 
mass m,;, so that the configuration space is E* x --- x E® (a times), and if 
a= <q!,...,a"> EM, where a! = <2',y!,z'> EE, then 
dllal|? = dmy|[a' |? +--+ + Smaller? 
= Fmji[(e!)? + (g')? + @)7] +--+ + amaf(e")? + )? + 7] 
is the kinetic energy of the system. As we indicated in Section 1, we may wish 
to restrict the manifold Mf to be that subset of E% x --- x E® for which no 


a; = a; forany 7 = 3. Let us further assume that the potential energy depends 
only on the mutual distances between the partieles. That is, let. us assume that 


U(a', tee a”) — P(|l@2 — ay |l, lla; — ay||, to ey lan —_ an—1]{). 


Then if A is a Euclidean motion of E* and we apply A simultaneously to all the 
particles, the kinetie and potential energy are conserved, That is, the trans- 
formation 

<al,...,a"> » <Aa!,..., Aa®> 


is an isometry of the Riemann metric on M and preserves U. 
Let <x',y',2',...,2", 4%, 2"> be the Cartesian coordinates on M, 
and let 
XGri, Gyts ety Gurr + + + 5 Fyn Yan, Pzly sey Pin? 


be the ¢orresponding coordinates on 7*(M). Now 0/a2 is the veetor field 
representing the infinitesimal translation in the z-direction in E*. Therefore, 


a ce] 


a 
er fe+-p — 


ax? ox" 
is the infinitesimal generator of the one-parameter group 
<2} y),ztc%,. 2,2, yy", 2"> 
> <r tay 2! 2? +6, y%, 2%, ...,2% +4 y%, 2*>. 
We can therefore apply Proposition 5.2 to conclude that the function 
Pri + Par b+ Pan 


is constant in any trajectory. This function is known as the total linear momen- 
tum in the z-direction. Similarly, the total linear momentum in the y-direction, 


Dor ess + Dyes 
and the total linear momentum in the z-direction, 
Dac + Pens 
must be conserved. If we define p* € E® by setting p' = <pes, py, p>, then 


13.7 THE TWO-BODY PROBLEM §29 


we can say that the E?-valued function 
plt p74... +p” 


must be conserved. This is the law of conservation of total linear momentum, 
(For two particles the assertion that p; + pz is conserved is just Newton’s law 
of “equality of action and reaction”. In our setup we see that this law is a 
reflection of the invariance of the physical situation under translations.) 

The vector field x(é¢/ay) — y(3/dx) represents infinitesimal rotation about 
the z-axis in E®. Therefore, 

a es a ee ee oe, 
dy) ¥ azi —e by? Y 9y2 + oe oy” Y ayn 

represents simultaneous infinitesinral rotation of all the particles about the 2-axis. 
Therefore, the function 


GeiPyt — QaPzi > * bh GenPyn — GynPan 
is conserved. This is the law of conservation of total angular momentum about 


the z-axis. Similarly, we obtain the law of conservation of total angular momen- 
tum about the x- and y-axes. If we set 
¢ = <Q Gy Wa E E, 
we can combine the three equations by saying that the function 
ad A pit-s+ tq A p® 
with values in /\?(E*) is constant on the trajectdries of the motion. 
Let us examine more dlosely the law of conservation of total Imear momen- 
tum. In view of the fact that 
t 


dx 
Pt a 


ete., we have p' = m,(da*/dé), and thus 
£ (myat + mga? +--+ + m,a”) = const. 


In other words, the center of mass 


Ge Ma test maa” 
My bers my 


moves in a straight line with constant velocity. 
Suppose there are only two particles. Then it is reasonable to introduce the 
center of mass 
mya) + mea? 
C = 
m, + Me 


and the relative position vector 


530 CLASSICAL MECHANICS 13.8 


as new coordinates. If we solve for a' and a”, we get 


1 mad 2 _ md 
a ig : 7 mi + my 1 


and therefore the kinetic energy is given by 


1 myme 


BAG OL" my, + me 


Wd]? + 40m, + meICll?, 
while U(C, ¢) = P((lell). 
Thus the motions of the system have the following description: The center 
of mass has constant linear motion, and the relative position vector d = a! — a? 
satisfies the equations of motion of 2 single particle with mass 
MMe 
my + mz 


in the central-force field with potential 7 = P{||d||). We can thus apply the 
results of the preceding section to determine the motions of the particle. In 
particular, if P¢||d)|) = «/||d|| is the inverse-square potential, the corresponding 
two-body problems can be completely solved. 

Note that if mg is very large compared to m,, then C is very close to a?. 
In this case d(-) is a good approximation to the motion of the smaller particle 
relative to the larger one. 

This is the situation that arises, for example, in the study of the motion of 
of the planets relative to the sun. 


8. LAGRANGE’S EQUATIONS 


Our discussion of mechanics has led us to a certain kind of vector field X on 
T*({Af), where Af is the configuration manifold. In the case where H = K + U, 
the Riemann metric giving K induces an isomorphism £& [that is, a diffeo- 
morphism which is linear on each 7,(/Z)] from 244) to T*( AM). We therefore 
obtain a veetor field Y on 7'(42) such that £4¥, = Xo, at any » € 7M), 
We can therefore inquire about the form of this vector field Y in terms of loeal 
coordinates <g!,..., 4%, g',..., @"> on T'(M), associated with coordinates 
<x},...,2"> on M. Suppose that the Riemann metric is given by 


an. -, I? = Doge’, -.-, 0a. 


If <q',...,9", p’,..-,p"> are corresponding local coordinates on T*(M), 
then the diffeomorphism £ is given by 


gq=¢, p= Laila’... g)¢. (8.1) 
We could proceed to use (8.1) to obtain the local expression for £ and thereby 


the local expression for Y. However, it is more convenient to argue a little 
differently. Let Z be the function on T(Af) defined by 


Lq',..59", 0'5---59) = 4 gud’ — OG@',.-., 9"), (8.2) 


13.8 LAGRANGE’S EQUATIONS 531 
that is, 
L= kK ~—U, 
where K(v) = 4{o||/? and Ute) = U(r(e)) are the kinetic and potential energy 
expressed as functions on T'(/f}, We can rewrite (8.1) as 
oL 


THe; PH aa (8.3) 
Then for any / € T*(/), we have 
H(D = (£7), ) — L277). (8.4) 


In fact, by definition, 
(£770), 0) = Hi? = 27° O1?, 


Wi? — Zid? + OG@r(D) 
slid? + UO 
= Hil). 
In terms of local coordinates we can write (8.4) as 
Mg eS hace) eee HS LG tag Gace ey. (8B) 


where the g’ are regarded as functions of the q’s and p’s via 2—!. We can write 
the map £7! as 


so that 
(£—'(D, ) — L(27*0)) 


joo of On 
q=g@, g ~ api’ (8.6) 


since H(2} = 4\//? + U(r@). Furthermore, by (8.5), 


oH = a i ¢ = i n «al an 

agi = ag oP LP pce Sacer 
=> agiee) ; abe) aL agio gg 
= ég? P agi aq? agi 
_ aes 
alee Ya 

by (8.4). 
Now let o(-) = <q'¢),..., 9°), @O,..-,4¢"@)> be a solution curve 


to the vector field Y, so that £& © vf-) is a solution curve of X on 7*(M). Then 
by (5.1), 


dg’ _ aH _ 5 
dt apt 
and 
dp’ _ d(aL/oq') _ _ aH _ aL, 
dé —iceatC <e—stsCG* SQ! 
In other words, Eqs. (5.1) are equivalent to the equations 
dg 4s — {aL fag) _ aL 
a, = 4, a(oh/oq) ofa) _ = 0 (8.7) 


532 CLASSICAL MECHANICS 13.9 


on T(M). The first of Eqs. (8.7) says that we are dealing with a system of second- 
order differential equations which is given implicitly by the seeond of Eqs. (8.7). 
Equations (8.7) are known as Lagrange’s equations. 

For certain purposes Lagrange’s equations are very convenient. We illus-~ 
trate by establishing the “principle of mechanical similarity”. Note that 
Eqs. (8.7) are unchanged if we replace L by cL, where ¢ is any nonzero constant. 
Suppose that A¥ is an open subset of a vector space with linear coordinates 
z',...,2" and that the Riemann metric is given by 3 g.;4'¢’, where the g;; are 
constant. Let m. denote the linear map consisting of multiplication by a > 0, 
that is, m.(z!,...,2") = <ar',...,az”>. Suppose that U7 is a homogeneous 
function of degree p, so that 


U(ar',..., 0x0") = oP U(r',..., 2). 
Now let us change our time scale by a factor 8, replacing ¢ by s = gt. Then 


dag’ 


ag’ 
$ dé’ 


WIR 


and 


Ly dag’ day? a? Ly dg’ dg’ 
gd 99 ds ds = ag le Mi 


dt dt 
Let us choose 6 so that a?/8? = a”, that is, 
6= el 2p 


Then 
1 7 a dg’ a Seve a dg = a” 1 a dg oe. aq” . 
1 (aq!,...,40", i! 28) = AL Oey Sierra ary 


In other words, replacing g’ by eg’ and ¢ by ¢ carries solutions into solution: 
if we change the linear scale by a and change the time scale by a!—""”?, we 
obtain an isomorphic situation. For instance, if U is homogeneous of degree —1 
(as in the case of an inverse-square law of attraction), then 8 = a*/?. In par- 
ticular, the period of any periodic orbit is proportional to the $-power of its 
linear dimension, which is just Kepler’s third law. We thus see that Kepler’s 
third law is an exact consequence of the inverse-square law of attraction. 

Returning to the study of the general Lagrangian system, we observe again 
that it is a system of second-order differential equations. We can therefore 
apply the fundamental existence and uniqueness theorem to it to conclude that 
for every x & M and for every & & T,(M) there is a unique curve Cl-) which is a 
trajectory of the system and for which C(O) = x and C’(0) = £. 


9. VARIATIONAL PRINCIPLES 


The function Z plays a crucial role in the study of variational principles of 
mechanics. Consider the following problem: Let p and q be two points of M, 
and let £; < ¢, be two real numbers. For any differentiable curve C' defined on 


13.9 VARIATIONAL PRINCIPLES 533 


the interval [?;, £2] we set 
Ic] = f * L(G) dt. (9.1) 
1 


Note that C’(i) € T(.M) and ZL is a function on T(M), so the integrand makes 
sense. Among all curves joining p to q, find that curve for which 7[C] takes on 
its minimum value. We shall see that a necessary condition for J(C] to be a 
minimum is that it be a trajectory of a mechanical system. (In fact, if suitable 
notion of neighborhood js introduced on the space of curves, it is also a necessary 
condition for C to be even a local minimum.) 

Before establishing this result, it is convenient to have another expression 
for Z[C]. As before, let £ be the map of T(.M) — T*(M) given by the Riemann 
metric. Let © be the curve in T*{I/) given by 

C@ = L(C'(). 
Then 


Ic] = Lc gS * H(G(L)) dt, (9.2) 


where @ is the fundamental linear form on T*(Af). 
In fact, in terms of local coordinates, 


a. 34 OL it i; aL 3 
fo=d frtat-5 f Hat => | Hea, 
since the eurve C’ by definition is such that 
$ dg’ 
dO =F. 
But, by (8.5), 
2 | ake d= D ; (pio £)()g*' dt = if [H ° 2(C’(t)) + L(C’W)] at, 


so (9.2) holds. 
Let Z be a vector field on Af which generates a flow ¢. For all sufficiently 
small s the curve ¢, ° C'(-) will be defined and 


(es ° C)’() = ge") = Tleel{C"(é)), 
so that 


Ie. 2 Cl= f° L(Tlele"@) at 


= — f"H(eo Te.(C(@)) at. (9.3) 


Loy,eG" 


Since Z[C] is to be a minimum, we must have 


dNgerCl _ 4 
ig 


We will now compute this derivative so as to derive the consequences of the above 
equation. 


534 CLASSICAL MECHANICS 13.9 


Now £0 T[y,] ° £1 is a flow on T*(M) which satisfies 
wo Lo T(y.)° Lo = oy. 


Let Z be the infinitesimal generator of this flow, so that at all points of T*(M) 


we have 
Wal = Z, (9.4) 


If we differentiate (9.3) with respect to s at s == 0, we obtain 


tg 
4 T[e.° C] = i Dz — [ DzH (C(t)) dt. 


Now 
D7 =adZ,ey+Z 1de@ and (2,0) = (aeZ, 1) = Z,D = fed, 


so we get 


4 tyocl= [2 140— f _ Di (CW) at + J2Clte)) — f2(C)). 
(9.5) 


Now suppose that the vector field Z vanishes at p and g. Then all the curves 
g,°C join p to g. If € is to minimize the integral 7, then the derivative 
d(I[e.° Cl) /ds must vanish at s = 0. Note that in this case the two last 
terms of (9.5) vanish and we must have 


[2 .100-f * DsH(TW) =0 (9.6) 
G ty 
for all vector fields vanishing at p and g. In particular, let us take 


a 
a re for zeEu, 
= 0 for «JU, 


where y is a C*-function whose support lies in some coordinate neighborhood U 
of M with coordinates <x1,...,2”>. Suppose further that ¢(p) = ¥(q) = 6 
ifp © U org EU. Then by (2.7) 
egy es oY 0. 
so that 
Za aTa=12+C RS, 
ag? ap) 
where B/ are some functions on 7*(J2) which depend linearly on Z. [This, of 
course, is just a restatement of (9.4).] Then 


Zjd@=ZiA>d. dp’ A dg = > B dg — yap’, 
i 


13.9 VARIATIONAL PRINCIPLES 535 


while aH 
eu . 


Thus if the curve T is given by C(i) = es Oy Oyen se POH) 
Eq. (9.6) becomes 


dg _ aH) _ (ce =) 7 
[lz (sz alt Nae ogee 
Now by construction, C() = £ ¢ C’(t), so on U we have 

dq? 13 oH 

de! = aps 


by (8.6). Thus the first sum occurring in the above integral vanishes and we 


must have 
[o(# 4 dt 4 2H) ene 


This must hold for all functions ¥ whose support lies in a coordinate neighbor- 
hood and vanishes at p and g. Clearly, this can happen only if 


(9.7) 


dp’ aH. (9.8) 


This must hold for all 7. Since (9.6) and (9.8) are exactly (5.2), we can assert: 


Proposition 9.1, A necessary condition for C to minimize the functional 
I[C] is that C is a trajectory of the corresponding dynamical system. 


The question of when this necessary condition is also sufficient is a more 
complicated one. We shall not discuss it in any generality here, but refer the 
reader to any standard source on the calculus of variations. 


Fig. 13.3 


By the way, we can derive a bonus from (9.5). Suppose that we consider 
the following problem: Let NV, and Nz be two submanifolds (Fig. 13.3) of M, 
and suppose that we require that C minimize 7 among all curves Joining N, to N2 
and not merely among those joining p to g. In this case (9.5) will have to vanish 
for all vector fields Z which are tangent to N, at p and to Nz at g. Now observe 


536 CLASSICAL MECHANICS 13.9 


that if C is a solution to this minimum problem, it certainly is a solution to the 
problem of minimizing / among all curves joining p to g. In particular, C must 
be a trajectory of the mechanical system. As the reader can easily check, this 
implies that 

{2 to— [Dz Co) di = 0 


for all vector fields Z. Thus, if C solves the more difficult minimization problem, 


we must have 
fz) — f2(Ce)) = 0. 
Now 
f2(()) = Zp, E()) and —fz(@(t2)) = (2g, Clta)). 


Since if p * g we can choose Z, arbitrarily in 7,(N,) and Z, arbitrarily in 
T (Ne), we conclude in this case that C must also satisfy 


(g,C(44)) =0 forall £€7>,(N4), 


(n, C(t2)) = 0 forall » € T,(No). o) 
Since C(i,;) = £C"(t1), we have 

(& C(4)) = (é, £C"(ty)) = (é, C'hh)), 
so we can write (9.9) as 

(é, C’(t,)) =0 forall £€7;(N1), (9.10) 


(1, Ca) =0 forall » & T,(No). 


In other words, the curve C must be orthogonal to the submanifolds WV, and No. 

Although our statement of Proposition 9.1 was couched in the framework of 
dynamical systems, it actually can be formulated in a more general context. 
Let E be any funetion on 7'(4/) —not necessarily of the type K — U. We can 
then define the integral 7 as in (9.1) and again pose the minimization problem. 
This is the typical problem in the caleulus of variations, We have already 
discussed this problem from a different point of view in Section 3.15. We leave 
to the reader the task of showing that our arguments carry over in this more 


general case if the matrix 
( aL 
Ox? O47 


is nowhere singular. (Here the map £ of T(AZ) — T*{AY) is given by 


Lee 
ogi’ aan (7? 


<z),...,2",4),...,2°> md Be 


and the nonsingularity guarantees that this map is locally a diffeomorphism.) 


13.10 GEODESIC COORDINATES 537 


10. GEODESIC COORDINATES 


In this section we depart. momentarily from the study of mechanics in order to 
exhibit some applications of the results of the preceding paragraphs to the 
study of Riemann manifolds. Note that a Riemann manifold always defines a 
mechanical system by its kinetic energy if we set the potential energy equal to 
zero. It is this special kind of mechanical system that we wish to study in this 
section. 

Let M be a finite-dimensional manifold with a Riemann metric and, as 
above, define L on T(Af) by setting 


Lv) = 4yfoli?. 


This then determines a vector field Y on T(Af) which corresponds to the system 
of differential equations (8.7) in terms of local coordinates. Let p be a point of M. 
For every § € T,(M) there is a unique trajectory C;(-) such that 


C0) =p, Cf(0) = & 
In terms of local coordinates, if & = <£},..., ">, then 
Ch) = <d®,..., 80>, 
where a are the unique solutions of the differential equations 


dgi_ 4: A@L/ag’) aL _ 
a 4 dt agi 
with the initial conditions 

G0) = «(p), (0) = &. 
By the fundamental existence and uniqueness theorem the functions gj are 
defined for sufficiently small &. We can regard C(t) as dependent on both £ and ¢. 
In other words, we have a map C.(-) assigning to each € € T,(M) and each 
sufficiently small ¢ a point of Mf. The map C.(-} is, in fact, defined in some 
neighborhood of T,(A1) x {0} c T,{Af) x R. 

The above is true for any Lagrangian funetion. In the case of no potential 
energy we can say a lot more. Let s be a real number. Consider the curve 
Crs +), that is, 

Cr(st) = <ge(st),..., @B(st)>. 


Then (suppressing the subseript ¢ which is to be understood in the following 
computations) 
da st ¢ 
Ga1S) agian, 
aL 


Sgt (A(t, --- a"(ct), oft), 86st) = 5 8? DH Grea", 


dg? 


538 CLASSICAL MECHANICS 13.10 


and 
at d |ab (qi (st), ..., g"(st), 347(st),..., sa"(o)] 


4 8 galartat, =. aso) 


=m PFS, g'O #0,...8O)),. 


In other words, 
d ab 


di agi (gis-),...,9%s-)) -S AG Coe ak 
d ob 


= 8 | agi a oye y= oat Lg Cpe x vo, = 0, 
Thus the curve C;{s-) is again a trajectory of the system, and we clearly have 
Cis -)ho = sCiO)lo = sé. 
By the uniqueness theorem for differential equations we thus must have 
Cy(st) = Cox(0). (10.1) 


We are therefore led to define a map, which we shall call exp, from 
TC) — M by setting 
exp (£) = ;(1). (10.2) 


Note that the map exp is defined and differentiable in some neighborhood of 
the origin in 7,,(Af). In fact, by (10.1), 


exp (£) = Ceynei(l ell), 


where now £/j{ él] lies on the unit sphere in 7',(/7) [the unit sphere with respect 
to the Euclidean metrie given by the scalar product on 7#',(/2)]. Since the unit 
sphere is compact, there is some € > 0 such that C(t) 1s defined for all y on the 
unit sphere and all |é] < €. Thus exp will be defined for all {él < e. 

The map exp is a differentiable map from some neighborhood of the origin 
in the vector space 7',{Z} into the manifold. Let us compute 


expay: TylTp(AN] — Td). 


Let €€7,(M), and let us consider the straight line through 0, é:, in T)(J%) 
given by i(4) = ¢& Since 7,(AZ) is a vector space, we can identify TofTp(AP)] 
with 7,({/2) via the identity ehart, in which case we identify 


H(O} with é. 
But 
exp [2;()}] = exp (4) = Cx(1) = C,(), (10.3) 


13.10 GEOQDESIC COORDINATES 539 


and the tangent to this curve at 0 is just £. In other words, 
exPeo [2:(0)] = &. 
Thus, if we identify T[T,(41)] with T,(44), we can assert that 


€XPxo! T'olTp(M)] — Tp) 
is given by 
exPeo (f) = &. (10.4) 


In particular, the map exp, is nonsingular, so that by the implicit-function 
theorem the map exp is a diffeomorphism in some neighborhood of the origin. 

We have thus constructed a diffeomorphism exp from some neighborhood 
of the origin in 7,(Af) into M, which by (10.3) carries straight lines through the 
origin into trajectories through p. In the ease of no potential energy the trajec- 
tories are called geodesics for reasons which will soon become apparent. If we 
identify T,(A£) with V by some chart (U, @), we can then use exp to introduce 
a new chart (U’, ,) by setting 


nz ' (Ex) = exp (£). 


The chart 7, has the property that n,.(p) = 0 and n, carries geodesics through p 
into straight lines through the origin. The chart (U’, 2.) is called a geodesic 
normal chart, and corresponding coordinates are called geodesic nermal coordinales 
on M. 

Let us consider the curve C;(-) = exp (- &) which is defined for 0 < ¢ < 1 
so long as ||&|| < € We have 


Tice) = f' L(CKO) ae = af NCI at 


But, by the conservation of energy, for any trajectory of our system we have 
Ho £(€’(t)) = const. In this case, since U = 0, Ho £(C’"()) = LIC’) = 
3/1Ci(O/? = const. Since C}(0) = £, we have |IC{(4)|| = [{él|, and therefore 


el? 
MCe(-)] = 4 / Jel? ae = EE. (10.5) 


Now let {8} be a one-parameter group of rotations in 7,(Af). Then 


?s = expe 8,0 exp! 


defines a one-parameter group on the open set U = exp {v: {oll < €} C MM. 
If ||é|| < €, we have by (10.5) 


2  etfe 
Igy 0 C,()] = Wéstl — BE, 


so, by (9.10), we get 


f Hes © C()] = 0 = fz,(T(1)) = CO), Zo), 


540 CLASSICAL MECHANICS 13.10 


M Fig. 13.4 


where Z is the infinitesimal generator of ¢. But Ze,1) = expe Y¢, where Y is the 
vector field on 7,(M) which is the infinitesimal generator of the one-parameter 
group of rotations, Now we can choose § arbitrarily. Therefore, Y; can be any. 
vector tangent to the sphere of radius {[¢i| in 7,(32). We thus conclude that 
(nm, C{1)) = 0 for any » which is tangent to exp Siz, where Sizi is the sphere 
of radius ||é|| in 7,(14). (See Fig. 13.4.) In other words, not only are the rays 
through the ongin orthogonal to the spheres in the Euclidean metric of T,(M), 
but also the image of a ray through the origin under exp is orthogonal to the 
image of a sphere about p in the Riemann metric of 1. 

In particular, we can transfer “polar coordinates” on T',(4f) to M so as to 
get “geodesic polar coordinates” on MM. This has the following effect: Let r be 
the “radial coordinate”, that is, x is the function defined in U by 


1 ||. 


r(x) = |lexp— 
Then for any  € U, x ~ p, and any ¢ € T,(M) we have 
Usll 2 (Xs, erp, (10.6) 


with equality holding only if { is tangent to a geodesic through p. In fact, 
suppose that ¢ € T,(M), where z = exp £ for some § € T,{(M). Then we can 
write ¢ = ¢1 + 2, where ¢; is some multiple of (j(1) and f2 is tangent to 
exp Sig. By the above result, (¢1, ¢2) = 0, 30 


su? = [geul? + [teell? 


and we obtain (10.6), with equality holding only if |[f2|| = 0. 

Weare now in the same fortunate position we were in Section 9 of Chapter 9. 
Let D be any curve joining p to x, where x = exp £. Let ¢, be the first time that 
D(t) & exp Sig. Then the length of 


1 > ty : ty 
D=f' pol a> f° Dl a> f° re, ar) 


> f° (DO, dr) = AD) = Hell 
0 


Furthermore, equality holds only if D’(é) is a nonnegative multiple of a tangent 


13.11 EULER’S EQUATIONS 541 


to a fixed geodesic through p. Then D must be the geodesic C;(-). In short, 
we have proved: 


Theorem 10,1. Let € > 0 be so small that the map exp is a diffeomorphism 
on B, = {§ € T,(M) : lel] < €}. Let x = exp ¢ bea point of U = exp B,. 
Then the geodesic C; joining p to x has length |{¢||, and any other curve D 
joining p to x is strictly longer unless D differs from C; only in a (monotone) 
change of parameter. In other words, loosely speaking, geodesics are locally 
the shortest curves joining two points. 


Since we have come this far, let us show in addition that geodesics also 
locally minimize the energy. Let D be any curve from [0,1] to M. Then by 
Schwarz’s inequality we have 


(f |D’(a) I as)? < ie |D'e|l? def 1dt 
= 2I{DI, 


with equality holding only if || D’(é)|| is constant. If C;(-} is the geodesic joining p 
to x = exp ¢, we thus have 


1D] > (f° rol a)? = 4(f terol at)? = Hal? 


Now equality holds in the second inequality only if D’(#) is proportional to C’ (2), 
while equality holds in the first only if [|D’(é)|| = const. We thus conclude that 
|D’()|| = [lel], that is, D’@) = C’(@). In short, we have proved: 


Proposition 10.1. Under the hypotheses of Theorem 10.1 the curve C;(-) 
is a strict absolute minimum for Z{C] among all curves C: [0, 1] + M such 
that C(O) = p and C(1) = x. 


ll. EULER’S EQUATIONS 


Under certain circumstances, which we shall presently deseribe, the equations of 
motion take a particularly elegant form. The first special assumption that we 
shall make is that there is an isomorphism of 7'(4f) with Af xX V. More precisely, 
we assume that there is a diffeomorphism, ¢, of T7(4Z) with M x V such that 
t(t) = <m,v>, where m = 7(£), and for each x © M the map £+>» of 
T.(M) — V is a linear isomorphism of vector spaces. For example, if M is an 
open subset of V, then the identity chart defines such an isomorphism 


t(g) = <m(8), far. 


A slightly less trivial example is furnished by the »-dimensional torus 7’* = 
S'x.--+-> S81. Then we can introduce “angular variables” @',..., @", where 
é* is the angular variable on the ith circle. We thus obtain n vector fields 
0/301, ..., 4/89" which are linearly independent at each point of Af. We havea 
basis of 7°,(M)} and therefore an isomorphism of 7',(4f) with R* = V which 


542 CLASSICAL MECHANICS 13.11 


defines the desired map 1. We shall encounter a more complicated example in 
the next section. We should point out that only very special kinds of manifolds 
admit such an isomorphism ¢ of T(Af) with Wf x V. 

For the rest of this section we shall identify T(4P) with Mx V and T*() 
with Af x V* via the corresponding (adjoint) isomorphism. 

The rule which identifies each 7,.(44) with V can be regarded as a V-valued 
linear differential form on Mf. Let us denote this form by w. In other words, 
the identification ¢ is given by ¢(£) = (&, wz) if & E T,(Mf). We can therefore 
study the V-valued exterior two-form dw. For each pair of tangent. vectors 
&, 7 € T,(M) we obtain (7, & Jj dw)asan clement of V. Now we can identify & 
and y with vectors of V. We thus obtain a V-valued antisymmetric bilinear form 
on V; if we call it @,, we have 


G,z(», w) = <n, g =) dw), 


where (£, w,) = v, (n, w;) = w, and §,7€7T,(M). Note that, in gencral, the 
bilinear form @, depends on x. Our second fundamental assumption about the 
identification t is that @, is independent of z. That is, we assume that there is a 
V-valued bilinear form @ such that 


(¥, X J dw) = @((X, w), (Y, #)) (11.1) 


for all vector fields X and Y on M. 

In the examples given above (the open subset of V on the torus) dw = 0, 
so that (11.1) holds trivially. In the next section we shall come across a casc 
where dw = 0. 

To understand (11.1) a little better, let us introduce the following notation: 
For any v € V let 4 be the vector field on M given by @,-w), = v for alla € M. 
Then for any », w € V, we have 


(b, 6 dw) = Db, w) — De, o) — {[f, d], «). 


Now (#@, w) = wand (é, w) = vare constants, so the first two terms on the right 
disappear. Thus we can rewrite (11.1) as 


Be, w) >= —<(8, w), w). (11.1’) 


For the kinetic energy of a mechanical system we need a Riemann metric 
on M, A Riemann metric on Mf gives a scalar product on each T,(M). This 
means giving a scalar product ( , ), on V for each rE M. Our third special 
assumption is that ( , )x does not depend on x. Thus we are given a scalar 
product on V which gives the Riemann metric on M via the identification of 
T(M) with M x V. 

We wish to describe the vector field on T*(M) as X_gy, where H = K+ U, 
K being the kinetic energy'’of the Riemann metric and U being some potential 
energy. Since 7*(M@) = M x V*, a vector field X on T*(Af) can be uniquely 
written as X = X, + Xz, where Xj is tangent to M and X92 is tangent to V*. 
Furthermore, we can regard X; as a V-valued function and X2 as a V*-valued 


13.11 EBULEK’S EQUATIONS 543 


function (identifying the tangent space to vector space V* with V*). Then 
(X, O)<zt> = (Xi, )) 

atany <2z,l> €M x V*. We should really write this as 
(, 6) = (¢, rw), p), 


where w is the form defined above and the V*-valued function p:M «x V* —> V* 
is the projection onto the second factor, p(m, i!) = L Then 


(, X 1d6) = (, w*w), (X, dp)) — ((X, w*w), ¢, dp)) — (¢, Xe* 1 de), p). 
Substituting ¥ = Y, + ¥2 and using (11.1), we obtain 
(Y, X dé) = (¥1, X02) — (Xi, Yo) — (@(M1, Yi), O. 
Now H = K+ U, where K(i) = 4(1, D and U(a, ) = U(x). Thus 


{Y, dH) = (Y2,)+ YG, 
and the equation 

{(Y, X Jdé) = —(Y, dH) 
becomes 


{¥y, X09) — (Xi, Ya) — (@(Xu, ¥1),9 = —(¥2,) — (¥1,d0), (11.2) 
which must hold for a}l choices of Y. 


Setting Y, = 0, we get 
(Xq,-) = (, 0. (11.3) 


In fact, the sealar product occurring in the last equation is on V*. Transferring 
it to an equation on V, we get 
(X),°) = ¢, 2). (11.4) 


Let fr <C'(), o> be a solution curve of our system transferred to T(MZ) = 
M x V by the Riemann metric. Then this last equation says C’(t) = v(t). 
If we set Yo = 0 in (11.2) and use (11.4), we get 


‘, X3) + (a(X, ), X1) = —(, dU). 
Now for any solution of (C, ») we have 


wan (4), 


so that these last two equations can be rewritten as 


or) = 00, 
(2) = (ae, 9,0) — @ a0, 


which are known as Euler’s equations. 


(11.5) 


544 CLASSICAL MECHANICS 13.12 


12. RIGID-BODY MOTION 


We shall apply the results of the previous section to the study of the motion of 
a rigid body. For simplicity, we shall confine ourselves to the study of the 
motion of a rigid body with one point fixed. The more general case where the 
body as a whole is allowed to move can be handled by similar methods. (Fre- 
quently, by considering first the motion of the center of gravity, the more 
general case can be split into two parts: the motion of the center of gravity and 
the motion of the body relative to the center of gravity. This then reduces the 
problem to the one we are studying.) 

In order to exhibit the generality of our method, we shall consider the 
equations of motion of a rigid body in E*. Only at the end will we make use of 
the fact that n = 3. Let us fix some positive orthonormal system “drawn through 
the fixed point of the body”. In other words, we fix some initial position 29 = 
<b',...,6"> of the body. Any other position, x, of the body can be obtained 
from xq by a rotation: 2] = Ry2. Let Rit) = e4' be a one-parameter group 
of rotations. Then 

R,R(i)xo = RyROR ‘x, 


is a curve of possible positions of the body. The tangent to this curve at X, will 
be denoted by A,,. Thus A,,€7,,(M). If zg = Roto = RoRT'Ryr9 = 
R.Rz 1x1, then A,, is the tangent to the curve Rehr = RoRyT'R Rx, 
so that A,, = (RoR7T')«Az,. It is clear from its definition that A,, = Oif and 
only if A = 0. 

We can regard the map A + A,, as a map from the space of skew adjoint 
linear transformations to T,,(4f). Let V denote the space of skew adjoint linear 
transformations. Then since 

dim V = dim aw = 2@ >) 
and the map A+> A,, is an injection, we conclude that it is an isomorphism. 
We thus have a trivialization T'(M)-> Mx V. Consequently, we get a V- 
valued linear differential form w, as in Section 11. 

Let us describe once more the meaning of w,;:T,(Af) ~ V. If ¢ € T.{M) 
represents an infinitesimal motion of the body, then since the body is rigid, we 
can regard £ as an infinitesimal rotation of the body relative to an observer 
situated outside the body (fixed in space). Thus ¢ = B for some B € Y, say. 
Then (,) is the corresponding infinitesimal rotation expressed in terms of 
the basis attached to the body; that is, (¢,#) = RT'BR, ifx = Ryxo. 

* We denote by A the vector field x > A, corresponding to A EV. Let ¢ 
be the one-parameter group generated by B for some B € V. Note that at any 
Xo = Reto we have ¢ + x2 = Ree? xo. Then at any x, = Rix we have 


eeRye** R72, = er Rye**ro = Rye**e? xy 


Rye (oF e4 6?) x0, 


I 


13.12 RIGID-BODY MOTION 545 


¢irAs, _= 'Caey (aus ere 
If we differentiate this equation with respect toi at ¢ = 0, we conclude that 
[A, 8] = (AB ~ BA) = —[A, B], 
so that, according to (11.1), we have 
a(A, B) = [A, B]) = BA — AB. (12.1) 
We now show how a mass distribution on the body determines a Riemann metric 
on M. Let p bea particle on the body with mass m. We will assume that the 
particle p has coordinates <p',...,p*> relative to the axes drawn on the 
body. Suppose that the body is at position x = <b;,...,b2>. Then the 
particle p will be situated at the point p'b; +---+ pp"), €E* If R(t) = e** 


is a one-parameter group of rotations, then when the body is at R,;R(Hx the 
particle p will be situated at 


R,[p'R@b: +--+ + p* RO») = RiROp. 
Thus the velocity of the particle p as the body undergoes the motion generated 
by A is R, Ap, and the kinetic energy of the particle p is 4ml[R, Ap]? = 
4mi| Ap|l?, since R, is an orthogonal linear transformation. We define the kinetic 
energy of A, to be the total kinetic energy of all the particles of the body. Thus 


sz] = of mllApll?. (12.2) 
body 

Note that ||4,|| depends only on A, so that our third requirement of the last 
section is satisfied, provided (12.2) does indeed define a norm on V. (Note: That 
the mass distribution could be such that Ap = 0 for all p in {p: m(p) > 0} 
does not imply that A = 0. For example, supposc that all the mass were con- 
centrated along a linc 1. If A represents infinitesimal rotation about !, so that 
Ap = 0 for p € 1, then {'A]] = 0. However, it is clear that if the set of p for 
which m > 0 spans E*, then (12.2) defines a Riemann metric. In fact, 


Aj] = 0 > Ap=0 


for all p belonging to a spanning set, and thus 4 = 0.) 

Let us examine the scalar product given in (12.2) a little more closely. Let f 
denote the linear function on E” @ E” corresponding to the scalar product on 
E* (which is a bilinear form); in other words, 


f(a, @ by + +++ + ay & de) = (ay, b1) + +++ + (te, Bi). 
Let s be any element of V @® V. Then s defines a bilinear form on Hom(E”) 
given by 
s(A, B) = f((A @ B)s). 


Note that if the tensor s is symmetric, so is the corresponding bilinear form. In 


546 CLASSICAL MECHANICS 13.12 


our present context the scalar product (12.2) comes from the tensor Il E V © V, 


where 
I= [{ m@®p 
body 
is called the enertza tensor of the body.* Thus (12.2) ean be written as 
[Ai]? = ICA, A). 
Euler’s equations in this case become 
Ci) = AQ) and (, “4) = I([-, A], A) — (Ae, dU). (12.8) 


Now the tensor J is symmetric. We can therefore find an orthonormal basis 
e?,...,e" of E” which diagonalizes J, so that 


I= Tye! @et + Ige? Me? +--+ + Ine™ Be*. 
Let £;; @ < 7) be the antisymmetric matrix defined by 
Eye=e', Eyei=—e', Eze’ = 0 (lb 5,9). 
Then 
(E;;, Ex.) = 0 if i,j ~ kl 
and 
T(E, By;3) = 1, + 1; 
Now let us see what Eqs. (12.3) say for the case 2 = 3, where we have 
[Fis, —F 5] = Eos, [£ 12, Eos] = —Ejs, |Er3, E23] = Ej. 
Suppose A(é) = a,())He3 — a2(H£ 3 + a3()Fj2, and let 
B= by, Fs3 — bok 13 + b3 Ey. = const, 


Then substituting B into (12.3) and comparing the coefficients of 5,, b2, and 
b3, we get 


(I + Is) Gt = (Is — Ia)azag + a(B12, 40), 


d me 

(a + Is)? = Ui — Ig)aiag + a2(—F 12, aD), (12.4) 
a = 

(11 + T2) TE = U2 — Iy)ayaz + a3(E 13, dU). 


A simplc case of these equations arises when there is no potential, that is, 
U = 0. First of all, suppose that the body has a spherically symmetric distri- 


* This differs slightly from that which is usually called the inertia tensor in physics 
texts. In terms of the coefficients 7; introduced below, what is usually called J; is 
Z; -|- Z; in our setup. 


13.12 RIGID-BODY MOTION 547 


bution of mass. Then J, = fy = 73 and Eqs. (12.4) become 


In other words, A = const. The motion of the body consists of a steady rotation 
about some axis fixed on the body. Of course, this means that for an observer 
in space also, the body undergoes a steady rotation about a fixed axis. 

Next, let us consider the case of an axially symmetric rigid body moving 
freely; that is, Jy = Zg and U = 0. The equations of motion then become 


da 
aE = Kaas, 
a 
= = —Kayas, 
das 
de 
where 
aa! Moule! 
Ig+ Ie 


The solutions of these equations can be written down immediately: a3 = 
s = const and 
@, = c, cos Ksét -- cy sin Kst, 
@g = —c, sin Kst + ce cos Ket. 


Thus for an observer situated on the body the instantaneous axis of rotation 
describes a circle around the axis of symmetry of the body. This motion is 
known as regular precession. (This motion should not be confused with the astro- 
nomical precession of the earth’s axis, which is due to the gravitational action 
of the sun and moon.) 

If no two of J,, I2, and 73 are equal, then the equations of motion can still 
be solved in terms of integrals, although the expression is rather complicated. 
We refer the reader to any standard book on mechanics for the details. 

So far we have been considering the motion with no potential. If 7 ~ 0, 
then Eqs. (12.4) are usually not so easy to solve. Let us treat the case of a 
symmetrical top; that is, 7; = 72 and U is given by the gravitational potential. 
In order to solve this problem it ts convenient to use the Euler angles of astron- 
omy as the local coordinates on M. In order to avoid confusion, we reproduce 
the definitions here: We let 6,, 4,, 4, be the basis vectors of E* corresponding to 
the rectangular coordinates x, y, z, where we take the origin of E® to be the fixed 
point of the body. We may assume that the center of mass of the body is distinet 
from the fixed point—otherwise there will be no gravitational force acting on 
the body. It is easy to see that the center of mass c lies on the axis of symmetry. 
We shall assume that the vectors drawn on the body are such that bs points in 
the direction from the fixed point to the center of mass, i.e., from 0 to ¢. Then 


548 CLASSICAL MECHANICS 13.12 


0 < @ < wis defined to be the angle between b3 and 4,: 
cos 6 = (83, 6). 


The line of nodes is the intersection of the plane spanned by 5, and bz with the 
zy-plane. (Thus, in order for it to be defined, we must restrict to the open set 
6 * +6,.) We define the unit vector 7 along the line of nodes to be the one that 
makes 4,, bg, » a positive (right-handed) basis of E*. 


; “8 e Se 
\\ Line of nodes 
‘ Fig. 13.5 


The angle 0 < ¥ < 27 is defined to be the angle that » makes with 6, and 
0 < ¢ < 2m is defined to be the angle that » makes with b}. We now wish to find 
the transformation relating the basis of T,(M) given by 0/8¢@, 8/dy, 0/éy with 
the orthogonal basis £12, —H#13, #23 introduced earlier. Suppose that x has 
the coordinate <¢@, ¥, ¢>. (See Fig. 13.5.) Now (0/06), represents an infinitesi- 
mal rotation about the line of nodes and » = (cos ¢)b, — (sin y)be, so that 


(3) = (cos y)E23 — (sin ¢){(—£3). 


The vector (3/d¢), represents infinitesimal rotation about 63, so that 


d 
(2) = Fie. 


Finally, (4/0), represents infinitesimal rotation about 6,. Now 
(8, 63) = cos 6. 


Furthermore, since (6,, 2) = 0, the projection of 5, onto the plane spanned by 
6, and $2 must still be orthogonal to 7, since 7 lies in that plane. It is therefore 
easy to check that this projection is given by (sin ¢)b; + (cos ¢)bo. Thus 


6, = sin é [(sin g)b, + (cos ¢)be] + (cos @)b3, 
and therefore 
a 


(3). = (sin sin ¢)E23 + (sin @ cos ¢)(— E13) + (cos E12. 


13.12 RIGID-BODY MOTION 649 
If & € T,(M) is given by 
we have 


a a .{ 4 
= 0G). +), + GS) 
2K(£) = [léil? 


= 67(cos 6)(I2+ Ia) + (sin @)?(f1 + 23) + ey + 12) 
+ 269 (1, + Ie) cos 8 
+ 9? {(in? #)[(Z2 + Tg) sin? ¢ + (7, + Ig) cos? ¢] + (71 + 2) cos? 8}. 
Now, by assumption J; = I. Let us set 
M,=1,;+f3=fe+TI, and Mg=1,4+ 72. 
Then the above expression simplifies to 
K(0, ¥, 9; 9, ¥, o) = $M, (6? + Y? sin? @) + $Ma(e+ 4 cos 6). (12.5) 


Let us now obtain the expression for the potential energy. It is proportional 
to the height of the center of gravity if we assume (as we shall) a uniformly 
vertical gravitational field. Thus 


06, ¥, ¢) = keos 8, (12.6) 


where k = mg|lcl| and m is the total mass of the body, g is the force of the 
gravitational field, and |lcl| is the distance of the center of gravity from the fixed 
point. Thus the Lagrangian is given by 


LO, ¥, ¢; 6, v, @) = $M 16" + P? sin? 6) + 4M o(o+ ¥ cos 6)? — k cos 6. 
(12.7) 


Note that 8/d¢ and 3/d¥ are both isometries of the Riemann metric which 
leave U invariant. Thus the corresponding moments are constants of the motion. 
In other words, for any motion of the system we have 


Pay =C, and Pay = Co, 
where C, and C2 are constants. But 


Pamy = 3 = M, ¢ sin? ¢ + Mz cos 6(¢ + ¥ cos 8) =€; 


and 
Paap = Male + ¥ 008 6) = Co. 


Solving these equations gives 


etvcoss— <2, My ¥ sin? 6 = C, — Cz cos 8. (12.8) 
2 


550 CLASSICAL MECHANICS 13.12 
Let us substitute (12.8) into the expression for the energy, 

E=K40 = 4M,(6? + y? sin? @) + 4Ma(o + ¥ cos 6) + k cos 8, 
to get, for a given value of C; and C2, the expression 


C3 
2M 


(C; — Cz cos 6)” 
2M, sin? @ 


+ 5 M8 + + k cos @ = const. (12.9) 

Thus, just as in our treatment of the central-force problem, for a fixed 
value of Pajsy and P;ay, the motion of @ is determined by a one-dimensional 
mechanical system whose energy is given by (12.9). After solving this mechan- 
ical system for 6, we can then obtain ¥(-) and y(-) by integrations from (12.8). 
Tn order to obtain qualitative information about the behavior of the solutions 
é(-), we can apply the method of Section 5 to our one~<dimensional mechanical 
system. Note that if C, = Cz (as would be the ease if the body were “spinning 
fast”), the kinetic energy tends to infinity as @— 0 and as @— 7. We there- 
fore conclude that @ oscillates between two values 0 < 9; < 6 < 62 < @. 
Tn other words, the axis of symmetry of the body executes a periodic up and down 
motion (called nutation). As @ oscillates, ¢ satisfies (12.8). Let us graph the 
curve that 63 traces out on the unit sphere. If C, > Cocos # > 0 for @) < 
@ < 4, then ¢ > O although it oscillates in magnitude. The motion is thus as 
given in Tig. 13.6. Another possibility is that 


C, —C2cos 0; < 0 and Cy — cos $2 > 0, 


so that ¥ is negative near @ = 6, and positive max # = @. In this case the 
average of ¥ over a period is still positive, so that the motion is as shown in 
Fig. 13.7. 


Fig. 13.6 Fig. 13.7 Fig. 13.8 


A limiting case is where C, — C2 cos 6, = 0, where the motion of 53 is as 
shown in Fig. 13.8. (This is the case that arises if the axis of a spinning top is 
held fixed at some position 6, ¥ and then aliowed to fall.) 

The motion of the axis of body around the z-axis in all these cases is called 
precession. It should be remembered that all this time the top is spinning about 
its axis with constant angular momentum C's. 


13.13 SMALL OSCILLATIONS 551 


13. SMALL OSCILLATIONS 


Suppose we are given a mechanical system on a manifold Af with energy 
H=K4+U. Suppose that the “foree field” df vanishes at some xq € M. 
Then the constant curve C(t) = zo is a trajectory of the system. In fact, let us 


choose a chart (W, a) with coordinates z',..., 2” such that 
a(ztp} = <0,...,0>. 
Then in the corresponding coordinates <q',...,g"> on a~'(W) we have at 
the point <0,...,0,...,0> 
dH aU oH aK 
—= — =06 and ES Ts = 
agi = xt Op* adapt 


¥t is therefore natural to expect that for small initial values of g and » and 
for small intervals of time, the solutions of the system should be well approxi- 
mated by the following linear system: 


Replace the potential energy, 
Dia!, tee Pe as = x ajax? + Us, 


[where Ug = 0((lz||*); that is, Ug vanishes to third order at x = 0] by the 
quadratic potential energy, 


Vaz) = $5 assa's’, 
and replace the kinetic energy corresponding to the given Riemann. metric, 
K@ d= $2 9(94'", 
by the one corresponding te the Euclidean metric at 29, 
Kola, @) = 4X 9:0, -.., Oa”. 


We thus obtain a mechanical system He whose corresponding equations 
(5.1) are actually linear. [The reader should check, as an exercise, that these 
equations are exactly the equations of variation (introduced in Section 2) of the 
vector field X_ay along the curve Ci) = <0,...,...0> in T*(M).] 

Of course, as time inereases, the values of g* and p' might become quite 
large and the linear approximations useless. However, under certain circum- 
stances we can guarantee that g’ and p’ remain small for all time. In fact, 
suppose that the quadratic form ¥ a;,x‘z’ is positive definite. Then U has a 
strict minimum at xp—say U(x9) = 0. In particular, if we start at x9 with a 
kinetic energy K = E, where # is sufficiently small, then x will be restricted to 
the neighborhood of xg defined by U(x) < £, and the momenta will be restricted 
by the condition K < E, since, by the conservation of energy, K + U = E. 
(See Fig. 13.9.) Thus the q* and the p*‘ will reraain small. This does not mean 
that the solutions to the original mechanical system with Hamiltonian H will 
remain close to one fixed solution curve of the linearized system. It does mean 


552 CLASSICAL MECHANICS 13.13 


Fig. 13.9 
*o 
that for any short interval of time (that is, a short interval near any time in the 
future), the trajectory will be close to some trajectory of the linearized system. 
It is therefore important to study the behavior of such mechanical systems. 
We are thus interested in the following kind of mechanical system: The con- 
figuration space Mf is a vector space. The Riemann metric is given by a4 Euclidean 
metric on the manifold. The potential energy is a positive definite quadratic 
form on this vector space. Let us choose rectangular coordinates x',..., 2” 
with respect to the Euclidean metric. Thus 


Wat, 6 NP = a)? + + CG"), 
and the map £ is given by 
Thus = 
L@, @) = Kq) — U@) 
= HE (¢)? — & ag'e’) 
and Lagrange’s equations become 


d z -¢ dé* . 
a = 4, 7 = — Yay’. (13.1) 


(Of course, these are just the Euler equations (11.15) for the case at hand where 
@ = 0.) 
Equations (13.1) can be written more suggestively as follows: Let A be the 


linear transformation whose matrix is (@:;). Stated invariantly, A is the unique 
self-adjoint linear transformation such that 


V@ = Use: (13.2) 


Then the trajectories v(-) of the system are the solutions of the second-order 
differential equations 


To find the actual solutions of (13.1) and (13.3) we apply Theorem 3.1 of 


Chapter 5. According to that theorem, if M is finite-dimensional, we can find an 
orthonormal basis e!,..., e" of eigenvalues of A. In other words, if z* are the 


13.14 SMALL OSCILLATIONS (CONTINUED) 553 


rectangular coordinates corresponding to this basis, 
Oe", 2.4 2") = Mae")? +e + Ane")? 
and Kgs. (13.1) and (13.3) become 


dz" 


GE = —aAye". 


Thus the general solution of (13.3) is given by 


2) = (a, cos dyét + b, sin AyHe? +--+ + (an Cos rnt + bp sin A, Ae”, 
(13.4) 


where the constants a? and 6! are determined by 


v(0) = aye’ +--+ + ane” 
and 


a (0) = Aydie? +--+ - + Andne™ 


Thus the general motion is a superposition of independent oscillations, the 
frequency of each oscillation being determined by the eigenvalues {i;}. That 
is why the mechanical system (13.1) is called a system of “small oscillations”, 


14. SMALL OSCILLATIONS (Continued) 


So far, we have been considering the linearized equations (13.1) as an approxi- 
mation to a finite-dimensional mechanical system. The philosophy has been 
that the solutions to the actual mechanical system exist, but are hard to find. 
We use (13.3) as good approximating equations. 


(0, 0) (0, 1) 


Fig. 13.10 


It turns out that the method has more extensive applications, even to the 
case of infinite-dimensional systems where the very existence of solutions to the 
“actual” mechanical systems may be difficult to establish. Let us illustrate by 
the mechanical system consisting of a stretched string which is held fixed at two 
endpoints. For simplicity in illustration, we shall assume that the string is 
restricted to move in the zy-plane, although this is in no way essential to the 
argument. Let us assume that the string is homogeneous and that the two 
fixed points are (0,0) and (0,1). Then the configuration space should be all 
possible smooth curves joining (0, 0) to (0, 1). Thus the eurve shown in Fig. 
13.10 would be a possible element of our configuration space. In some sense, 


554 CLASSICAL MECHANICS 13.14 


the configuration is an “infinite-dimensional manifold” and with suitable work 
this idea can be made precise. However, what is of interest to us is behavior 
near the “equilibrium” curve C(f) = (4,0). For such curves we will have 
dx/dt > 0, and so the curve can be described by using x as independent variable, 
i.e, by giving a function u(z). In other words, we are replacing the big eon- 
figuration space by the approximating vector space V of all functions u of one 
variable, with u(0) = u{1} = 0. Thus Y is regarded as the “tangent space” 
to our system, in the sense that « is the “tangent vector” to the curve €,(-), 
where C,(7) = (7, su(7)). (Remember that our configuration space is a 
collection of curves, so that a curve in our configuration space is a one-parameter 
family of curves.) Now we expect the “kinetic energy” of w to be the total of 
the kinetic energy of all the particles on the string. The particle at 7 has velocity 
u(r) and therefore kinetic energy 4mu(r)*. If we assume that the mass density 
is constant, we thus get 


1 
Ko(u) = 4m f u? dx 
0 
as the expression for the kinetic energy. This makes V into a pre-Hilbert space 
as in Seetion 6 of Chapter 64. 


We expect that the potential energy depends on the stretching of the string, 
i.e., that it is some function of the length: 


DO =F(f "|e dt) = FP), 


where L is the length of the curve and F is some smooth function with F(1) = 0. 
For curves parametrized by x, the length is given by 


1 
du\* 
[vr dz. 


V/1+ a? = 1+ fe? + higher-order terms. 
Thus using a Taylor expansion for F, 


F(L) = F’(1)(L — 1) + quadratic terms in (L — 1} 


Now 


and 


1 
ail du\? i : _ du 
L—-i=5 i i) de + } higher-order terms in 7, 


we see that U2, the quadratic approximation to U, is given by 


Uru) = § : (4) ae. 


18-14 SMALL OSCILLATIONS (CONTINUED) 555 


By analogy with (13.3) we expect that the “small oscillations” of the string are 


solutions %,(-} of the equations 
2 
dy = — Atty, 


df2 


where A is the self-adjoint linear operator such that 


1 
Cc du\? 
(Au, u) = cif (a ax. 


1 
(Au, u) = “ai u(Au) dz. 


Now 


Since «(0) = u(1) = 0, we have, by integration by parts, 


du du =f e) 
o dx dz 6 a@x2/’ 
so that 
C atu 
BS aide 
Equation (13.3) thus becomes 
a\? Cd 
(2) = nt u,{0) = u(t) = 0. (14.1) 


Note that we have derived (14.1) by reasoning by analogy. We have not 
formulated the actual (nonlinear) infinite-dimensional mechanical system, nor 
have we any guarantee that there is one that can be solved. Furthermore, we 
don’t actually know the function F. We only need to know its form and the 
value of F’{1). Nevertheless, Eq. (14.1) gives a good explanation of observed 
physical phenomena. 

The solution of (14.1) proceeds just as in the finite-dimensional case due to 
the fact that the operator A with the boundary conditions u(0) = u(1) = 0 
is a Sturm-Liouville system, so the results of Sections 6 and 7 of Chapter 6 
apply. In fact, we can choose the functions 


U,(a) = sin (272) 


as an orthogonal basis of eigenvectors of A, where uw, has the eigenvalue 
n*(c/m)x*. Thus the general solution is given by 


t(x) = (a, cos at + by sin at) sin (72x) 
+ (az cos Qet + be sin Qet) sin (2Qrz) +--+, 


where a = (c/m)x?. In other words, the general solution is a superposition 


556 CLASSICAL MECHANICS 13.14 


(linear combination) of the “harmonies” 
sin net sin nz, cos nad sin nre. 


These can be regarded as “standing waves”, as, for example, in Fig. 13.11. 


u 


Lain 3ed sin 3ane Fig. 13,11 


As another illustration of this method, let us consider the “vibrating mem- 
brane” in 7-dimensions. Here we are given a domain D with almost regular 
boundary in E". We consider a stretched membrane in E*+)! = E" x E} 
which is fastened along dD in E® x {0}. Again, as our linear approximation V 
to the configuration manifold, we take the space of all functions « on D which 
vanish on 6D, To be precise, we let V be the space of functions which we 
defined, are of class C* in some neighborhood D, and vanish on 6D. Thus the 
membrane is the surface in E”*+! whose points are of the form 


Se PE ae | Cie Bo 
where <x!,...,2"> ED. As before, we define the kinctic energy as 
Ku) = tf w, 
D 


while the potential energy is to be some function of the total volume (area) 
which vanishes at 4(D). Now the total volume (area) of the hypersurface is 
given by (scc Exercise 4.3, Chapter 10) 


[vieE(y- 


2 
Ta(u) = f > (2) = £ Du, ul, 


where D is the Dirichlet integral introduced in Section 11 of Chapter 12. By 
Green’s formula we have (since u = 0 on @D) 


Diu, vu] = — fh udu. 


Thus, as before, 


Thus the operator 4 of (13.3) is given by —(e/m)A, and Eq. (13.3) becomes 


qd? e 
de b= a Au. (14.2) 


13.14 SMALL OSCILLATIONS (CONTINUED) 557 


Note that this is exactly (14.1) ifn = 1. In order to solve (14.2), we must find 
a complete set of eigenvalues for (14.2). If {#2} is such a basis, where 2; is the 
eigenvalue associated to u;, then the general solution of (14.2} would be given by 


wir) = YX (Gy COS Apt + By BIN Ant) Uy(Z). 


The problem of showing that --A has a complete set of eigenvectors is 
somewhat more difficult for x > 1, and will be discussed in the exercises, 


EXERCISES 


In order to study the eigenvalue problem it is convenient to replace the space of 
C!-functions vanishing on 0D by the space HP (where we refer back to page 361 for the 
definition of the spaces H?.) Yo show that this replacement is legitimate, we have: 


14.1 Show that if ¢ is a function which is C* in a neighborhood of D, then » © H? 
if and only if ¢ vanishes on 0D. 

14.2 Let the operator K be defined as Kf = (1 — A)f for smooth f and extended as 
in Section 14 of Chapter 8 to a map of H, — H,-2. Letg €@ Ho. Since 


l(g, *)| < Hgllollello < Iigllolell, for any ve H7, 
conclude that there is a bounded linear map L from HW? to H? satisfying 
(Lg,v)1 = (g,%)9 forall ve HP. 


14.3 Let f and g be locally integrable functions on D. By this we mean that fy and 
ge are integrable for every test function g€CF(D}. We say that the differential 
equation Af = g holds weakiy on D if (f, Ke) = (g, ¢) for all test functions ¢g ECG (D). 
Show that this generalizes the notion of being a solution by proving the following 
lemma. 


Lemma. If the functions f and g are respectively in the classes C2(D) and CD), 
then the equation Kf = g holds in the weak sense if and only if it holds in the 
classical sense. 


In proving this lemma you may assume that if & is a continuous function on D 
such that fp Ag = 0 for every test function ¢, then A = 0. 


14.4 Prove that the operator 4 of Exercise 14.2 is a right inverse of A in the weak 
sense. That is, show that ifg & Hf andf = Lg, then Kf = —g weakly on D. 

14.5 We now want to show that f is actually in €?(D) if g is suitably smooth. 
Roughly speaking, what we want to do is to “round off” f and g near the boundary of C 
in such a way that we can consider the adjusted functions to be defined on the whole 
of R*, and can then apply Exercises 14.25 and 14.30 of Chapter 8. 

Our rounding off process will simply be multiplication by an arbitrary but fixed 
function ¥ in C¢(D). Prove, to begin with, that multiplication by such ay is a bounded 
linear mapping of H, into itself for any s. 


14.6 We know that Dj = 0/dz; is a bounded linear mapping from H, to H,_, for 
every s. Combine this fact with the result of the above exercise to show that if Kf = g 


558 CLASSICAL MECHANICS 13.15 


weakly on D, and if ¢ is any fixed element of C9 (D), then there is a differential operator 
R of order 1 defined on the whole of R® such that 

1) Ki¥f) = yo-+ Rg weakly on D; 

2) ht» Rh is a bounded linear mapping from H, to H,_, for every s. 

In order to consider # to be defined on R” we have to extend % to R® in an 
obvious way. The proof is essentially an integration by parts. 


14.7 Wesay that a function ’ defined on D is locally in H, if gh € H, (when extended 
to R") for every gE CO(D). Use the above exercise to prove the following lerama: 


Lemma. Suppose that Af = g weakly in D, that f < A; locally in D, and that 
g € Hy locally in D. Then f © Ainingn+2,341) locally in D. 


[Hind: In order to prove this crucial lemma, show first that the weak differential 
equation of Exercise 14.6 holds for all test functions g on R®. The crux of the matter 
is that there is a function X € C¢{) such that X = J on the support of ¥. We extend X 
to R* as above, and then for each test function ¢ on R* we write 


gp =Xet 1 — Xe, 


where Xe € Co(D). Now use the fact that the test functions ¢ on R” are dense in H, 
for every s, that yf & H,, and that ¥g © Hy. ] 

14.8 Suppose now that g ¢ H@, thatf = Lg € HP (Exercise 14.2), and that kf = g 
weakly on D (Exercise 14.4). Apply the above lemma repeatedly to show that if 
g © H,, locally in D, then f © Aa;2 locally in D. Conclude from Soboley's lemma that 
ifm > n/2+ j and g € Hp locally in D, then f = Lg € C7+?(D). 

14.9 Show that ||Zgll1 < |lgllo and conelude from Exercise 14.31 of Chapter 8 that if 
we regard /. as an operator from H? to #2, it is compact, and all of its eigenvectors 
belong to HP. Use Exercise 14.8 to show that every eigenvector belongs to C*(D). 


15. CANONICAL TRANSFORMATIONS 


In Sections 1 through 5 we formulated the notion of a mechanical system as a 
flow of a certain type on the cotangent bundle of the configuration space. The 
defining equations for the vector field X generating this fow were X¥ JQ = 
—dH. Thus the basic property of the cotangent bundle used in singling out the 
class of flows is the existence of the two-form &. Ii turns out that in studying 
the equations of mechanics it is sometimes convenient to forget that the flow is 
on the cotangent bundle and to concentrate on the form &. For instance, we may 
be able to introduce charts that don’t arise from the configuration manifold but 
in terms of which the vector field X takes a particulary simple form, We shalk 
therefore want to consider a manifold N which carries a two-form 2, subject to 
certain restrictions which we shall describe below. On such manifolds we shall 
study vector fields satisfying X _]@ = —dH. It will be convenient to allow H 
to depend on the time ¢, as well as being a function on N, so that X will be a 
time-dependent vector field. The reason for this is twofold. First of all, it allows 


13.15 CANONICAL TRANSFORMATIONS 559 


the consideration of “nonconservative” mechanical systems. Secondly, even in 
the study of the systems we have introduced so far, it is sometimes convenient 
to make a time-dependent change of coordinates to simplify the equations. This 
has the effect of changing # time-independent vector field into a time-dependent 
one. Now to the definitions: 


Definition. A manifold N is said to possess a Hamiltonian structure (or to 
be a Hamiltonian manifold) if there is an exterior two-form 2 defined on NV 
such that 


i) dQ = ® and 
ii) @ is of maximal rank in the sense that (4.3) holds, 


Remarks 
a} As we have seen, if N = T*(M), then N is a Hamiltonian manifold 
where 2 is given by (4.1). 


b) If W is finite-dimensional, then it must be even-dimensionali In fact, 
condition (ii) says that @ restricted to each tangent space-is an antisymmetric 
bilinear form which is nonsingular. This can happen on a vector only if it is 
even-dimensiona}. 


c) If can be proved that if N is s finite-dimensional Hamiltonian manifold, 
then one can always find local coordinates qi,..., Gn, Piy.--; Pn such that 


= SE dp: A dq;. 


[We know this to be the case if N = T*(M).] The point of this result (which 
we shall not prove here) is that locally all Hamiltonian manifolds of the same 
finite dimension look alike. 


We shall now single out a class of vector fields on N x R. 


Definition. The vector field X is a Hamiltonian vecter field if there is a 
function H = Ay on N X R such that 


Xt = (¥, dt) =1, (15.1) 


where ? is the standard coordinate on R, regarded as a function on NV X R, 
and 
X 1 (a*Q — dH A dt) = 0; (15,2) 


where 7 is the projection of N x R onto N; w(x, t) = x. Note that H is 
determined up to a function of ¢ alone. 


Let us set 
@ = 7*2, 
so that (15.2) can be written as 


X 1(w — dH A dt) = 0. (15.2’) 


560 CLASSICAL MECHANICS 13,15 


£ 


fences 


Fig. 13.12 


If we consider the direct sum decomposition of the tangent space of N x R, 
then condition (15.1) says that we can write 


Ya(x? 
r(x), 


where X is a time-dependent vector field on NV; that is, X is a rule which assigns 
a tangent vector X(z, ) € T,(N) to each x and f. Since w does not involve di, 
we can write 

XIJo= X(,HiI2 atany timei. (15.3) 


[Strictly speaking, this equality should be written as follows: Let 7,.: ¥ ~ N xR 
be the map defined by 7,(2} = (a, i). Then 


F(X Je) = X¢-, 8 12] (15.3’) 
Also, 


(X, dH) = (X(.,), dH, 0) + on avasy- Gms t 


‘Thus (15.2) can be split up inte two equations. When we compare the terms not 
involving df, we obtain 


XO, IQ= —dH(-, 8 for every fixed ¢. (15.2a) 
Thus 
Figs ane a dt. (15.2b) 


Notc that (15.2a) is just the condition stated at the beginning of Section 5 with 
the novelty that H (and therefore X) can now depend on time. 


Definition. A diffeomorphism, 7, of N X R— N X R is ealled a canonical 

transformation if 

i} S*(w) = w — dW A dt, where W = W, is some function depending 
on $; and 


13.15 CANONICAL TRANSFORMATIONS 561 


Fig. 33.13 


ii) 3 is time-preserving, i.e., @ has the form ¢(z, t) = (¢(zx, t), £), where 
y(-, t) is a diffeomorphism of N for each ¢. 


Observe that if # is a canonical transformation, then so is 7) and 
W5-1 = —(9')*W5. (15.4) 


Also observe that if @ and ¥ are canonical transformations, then so is Po %, 
where 
Wye = OW + Wy. (15.5) 
These facts follow directly from the definitions and will be left as exercises for the 
reader. 
We note next that if 3 is a canonical transformation and X is a Hamiltonian 
vector field, then *(X) is also a Hamiltonian vector field. In fact, 
eX j[w — d(We+ 3*H) A di] = 3*X I [p*w — od A db) 
= 3*[X J(w — dH A di)] = 0. 
Thus we may take H sx as 
Hox = Wat rH. (15.6) 
Let X be a Hamiltonian vector field, and let be the mapof NXR—~NxR 
obtained by letting the system evolve from time it = 0 according to the flow 
generated by X. That is, let the map ¢(-, 4} be defined so that the curve 
tr g(a, t) is a solution curve to the (time-dependent) vector field X which 
passes through z at time é = 0. To put it another way, the curve f> (o(z, 1), ¢) 
is the solution curve to the vector field ¥ which passes through (z, 0) at time 
zero. (See Figs. 13.12 and 13.13.) 
Note that it follows from the definition of @ that 


é 
Bx (2) = Xo 


We claim that @ is a canonical transformation. In fact, 


it (B*w) = (Bo i," 2*a = of-, *2, 


562 CLASSICAL MECHANICS 13.15 
since $ ° 2,(z, 2) = (y(z, 8,1). But 


d * 
ae $C 8°) = oC, 8)"Dxy.n9 = 0 
by (15.2a). Thus 


Vee = 9, 
or, in other words, y*a is of the form w + @ A dt. To determine 6 it suffices to 
take the interior product with 2/é¢, since w doesn’t involve dt. But 


a _x a) xl {9 ky | a4 ec | Of 
(2 sete*e) = o*[a (2) seo] er r'a =e ( aH - at) 


so 6 = ¢* dH. Thus @ is a canonical transformation and 
Ws = —3"Ax. (15.7) 


Note that (15.7) is just what we would expect from (15.6). In fact, p*X = 3/dé, 
and we may take Ho;a, = 0. 

[equations (15.6) and (15.7) are used in conjunction in the following way: 
Suppose that #7 = H,+ H,, where we know how to solve the differential 
equations corresponding to Hy. In other words, we can find the map ¢ corre- 
sponding to the vector field Xg, where Xp 1 (w — dHy A di) = 0. If 


X 1(w— @dH A di) = 0, 


then »*X is a vector field whose corresponding Hamiltonian function is @*, 
by (15.6) 

This method was first introduced by Lagrange in the study of the n-body 
problem. We can let Hy be the Hamiltonian obtained by ignoring the terms in 
the potential energy coming from the interaction of the plancts, and let H;, be 
the rest of the Hamiltonian H. The solution for Hy is then given by having the 
planets move about the sun according to Kepler’s laws. For simplicity in 
discussion, let us restrict our attention to that portion of phase space where the 
motion is clliptical. Then the motion of the planets is specified by giving the 
various parameters of each ellipse (such as the plane of the ellipse, its major 
axis, its cecentricity, etc.) and telling the position of the planet on its ellipse at 
time ¢ = 0. This corresponds to the use of the map ¢. One then regards the 
equations of motion of the whole system as differential cquations for the 
parameters of each ellipse. This corresponds to studying the vector field y*X. 
This idea of introducing the parameters of the ellipses as “generalized” coordi- 
nates was one of the key steps leading to the notion of the invariant calculus 
on manifolds. 

We have seen that solving the differential equations corresponding to a 
Hamiltonian H is the same as looking for a map @ satisfying (15.7). Under 
certain circumstances this can be reduced to looking for the solution to a certain 
partial differential equation. Suppose that we have local coordinates gy, . - . 5 Gry 


13.15 CANONICAL TRANSFORMATIONS 563 


Pi,..-, Pn such that 2 = Lo dp; A dg;. Let V = V(qi,..-,@n, Bis. ++) Pay O 
be a function such that the maps ¢, and ¢2 are diffeomorphisms, where 


oV ov 
P11, 66 Iny Piss Pay = th oe ) a 7 ' 


and 
if Pp Pp 2) — wv preey ov p tb 
F291, ~ ++ > Ens Pir - + +5 Pros api dpa p++ +3 Pr, 
We claim that = ¢) ¢ ¢;' is a canonical transformation and that 
WeHer (15.8) 


at 
Note that w = d(S p; dq,) = —d(X gdp). Thus 
ea — w= da" XS pidg: t+ XO 9: dps) 
= dey * (et aa) pi dg: + ¢2 ie 4s dpi) 


= dor'* fay — ae 


—1* OV 
at 


~a(¢ si) A dt. 


If we substitute into (15.7) we see that ¢ solves our differential equations if and 
only if 


ep "d dV — des 


A dt 


pT 4 stu =0. 


But @*H = ¢7'*etH, so we can write this equation as 


a+ i = 0 (15.9) 

or 
av av av \. re 
aS | CE ee) : (15.9’) 


Equation (15.9) is known as the Hamilton-Jacobi equation. 

We therefore have a prescription for (locally) solving the equations of 
motion: Find a solution V of (15.9) which has the property that ¢; and ge are 
diffeomorphisms. Under certain circumstances, a proper choice of coordinates 
allows us to solve (15.9) by the method of “separation of variables”. We illus- 
trate this method in the following examples. 


564 CLASSICAL MECHANICS 13.15 


Example I. Ceniral-force motion again. Here 
_ lfie. pa 
H= a(t? Ht) + ue, 


when we use polar coordinates in the plane. Equation (15.9) becomes 


o+ pal (3) fou (3 ‘|+ U(r) = 


Since the variables ¢ and @ do not occur explicitly in this equation, we seck a 
solution of the form 
V = Vitt) + Volt) + Var) 


and conclude that both V{(é) and V3(8) depend only on r, and so must be 
constants. We may thus write 
i) = —E, V3(@) = Aa, 


where E and A, are constants, (This just reflects the conservation of energy and 
angular momentum.) We then get the equation 


© = vale) = afom(E — Ue) — 48. 
Thus 


V = Ago + [ |2m(e — U(s)) — atl ds — Et 


is a solution of (15.9). Here we consider V a function of the variables r, 0, F, Ap. 
Then the map ¢o is given ee 


e2(r, 8; B, As) = _)—t+ mes = 
|2m (E — U(s)) — [ance — ve) — 48)” 


Ag ds 
Foe eyo 
2 om(B UC ys) 


Now the map ¢; ° gz! takes the flow into the constant flow, so that we must 


have , 
Gite ———~# __. 
, [ance 3 0G))= ay 
and 


: Ap ds 
§— 6 = | ———_— —_.\. 
ie *| am(B - Us) oa 


13.15 CANONICAL TRANSFORMATIONS 565 


where fy and @) are constants. Note that the second of these equations gives the 
orbit explicitly, which can then be solved to give r as a function of ¢. 


Example 2. The simple harmonie oscillator. Here 


2 2 
a are 
_ 2m 2’ 
so Eq. (15.9) becomes 
2 2 
3 (®) 44% 2 
2m \ ag 2 af 
Again, since time doesn’t enter explicitly, we can write 
V= -—EHi+ W, 


where W is a function of g alone which must satisfy 


Dia A 
5, WY +S =F 


or 
9 /ox 142 
W = vink [ (2 = ) ds. 
Then : 
av a e “yn 
== |— =~ ds — t. 
aE \k che 7 
Thus 


I 

| 
oro 

ar] 
aneattll 
res 

rt 

3 

o 

o 

ih 
Bl 
~~ 
w 

fren) 


Solving for g in terms of é gives 


aE BY? 
q = [7 cos (4) {t — ig). 


Example 3. The motion of a particle attracted by two fixed point masses. Here 


a ee eee oe 
H= Bm (P= rete ea) gel ge? 
where r; and rz are the distances to the two points and A and B are constants 
(determined by the masses of these two points). 
For the purpose of solving this problem, it is convenient to introduce so- 
called elliptical coordinates. Let us assume that the two fixed points lie on the 
x-axis, each at a distance c from the origin. We may take ¢ = 1 for simplicity. 


566 CLASSICAL MECHANICS 13.15 


Fig. 13.14 
(See Fig. 13.14.) In the ry-plane define the local coordinates £ and y by setting 
= 301i tre), n= 371 — 72). 


Thus the curves § = const represent ellipses with semimajor axis = and foci at 
the two fixed points, while the curves 7 = const are hyperbolas with semimajor 
axis 7 and the same foci. Note that 0 < |y] <1 < & < oo. The equations of 
these two curves are 

2 2 2 2 


a aoc and a Toeoh 
so that 
a? —= gy? and oy? = (€? — 1)(1 — 97). 
Thus 
de _dt dy dy_ Edt indy 
& g y y Sk dye 


and therefore 


at? dy? 
dx? + dy? = (@ — a) (ES + a). 


If we now rotate about the z-axis to get the analogue of cylindrical coordinates 
in space, we have the coordinates 


<x, p, @> and <£, 79, 0> 
in space, and the Euclidean metric is given by 
dx® + dy? + dz* = dx? + dp* + p? de” 


en yeas a dy? 


— FR 


i) + (? — 1)(1 — 9?) de’. 


Also 
; ya Ag BL Amt Br 
ry fe r4Pe 
1 
= sy (aE + Bn), 


£2 — 2 


13.15 CANONICAL TRANSFORMATIONS 567 


where a = A+ Band 8S = A — B. Then A takes the form 
gt 
2m(E? — 9?) 

1 


x|¢ — PP +47 — we + (hy t pa) rit eet oy]. 


H(£, 9, #, Pz, Py, Po, 9) = 


Since { and 6 do not occur explicitly in (15.9), we may write 
= —Et+ Ag+W, 


where now W must satisfy 


2 
i =e8) a) +o? (Hh (Ay pate) Ab + ot + bn 
= 2m(? — 9B. 


Note that if we set W = W,(£) + Wa(x), this equation separates into two, and 
we can explicitly solve each of them by quadratures. This gives the solution to 
the original equations of motion. We leave the details to the reader. 


SELECTED REFERENCES 


Chapters I, 2, and 7 
A more extensive treatment of linear algebra, over arbitrary coefficient fields, can be 
found in 


Brekuorr, G., and 8. MacLane, A Survey of Modern Algebra, 3rd ed., Macmillan, 
New York, 1965. 


Horrman, K., and R. Kunzs, Linear Algebra, Prentice-Hall, Englewood Cliffs, N. J., 
1961. 

For linear algebra over commutative rings that are not fields, see 

Lana, 8., Algebra, Addison-Wesley, Reading, Mass., 1965. 


Zaniski, O., and P. SAMUEL, Commutative Algebra, 2 vols., Van Nostrand, Princeton, 
1959, 1960. 


Chapter 3 


Difficult but rewarding is 


Dizuponns, J., Foundations of Afedern Analysis, Academic Press, New York, 1960. 
The differential notation in this book is Df(a) (instead of dF.) and Df(a)-& 
(instead of dF,(£)). 


Chapters 4 and 5 


Good books on general (point set) topology are 
Ke.tey, J., (reneral Topology, Van Nostrand, Princeton, 1961. 
Simmons, G., Topology and Modern Analysis, McGraw-Hill, New York, 1963. 


The remaining standard examples of Banach spaces and Hilbert space require the 
Lebesgue integral and are therefore beyond our scope. However, the interested reader 
can pursue the abstract theory of Banach and Hilbert spaces in Simmons and in such 
books as 
Hinz, E., and R. 8, Puiuuies, Functional Analysis and Semigroups, American Mathe- 
matical Society, Providence, R. 1., 1957. 
Murray, F. J., An Introduction to Linear Transformations in Hilbert Space, Princeton 
University Press, Princeton, 1941. 
Riesz, F., and B. Sz-Nacy, Functional Analysis, Ungar, New York, 1955. 
569 


570 SELECTED REFERENCES 


Yosipa, K., Functional Analysis, Springer, Berlin, 1965. 
The books by Murray and Yesida are more advanced and harder. 


Chapter 6 


Standard books on ordinary differential equations are 
Brrknorr, G., and G. C. Rota, Ordinary Differential Equations, Ginn, Boston, 1962. 


Hurewicz, W., Lectures on Ordinary Differential Equations, M.I.T. Press, Cambridge, 
Mass., 1958. 


Advanced treatises are 


Coppineton, E., and N. Levinson, Theory of Ordinary Differential Equations, 
McGraw-Hill, New York, 1955. 


Harrman, P., Ordinary Differential Equations, Wiley, New York, 1964. 


Chapter 8 


This chapter has been devoted to the theory of content. For modern mathematies 
the more powerful theory of Lebesgue measure and integration is needed. For one- 
dimensional theory the reader can consult 


Rupin, W., Principles of Mathematical Analysis, 2nd ed., McGraw-Hill, New York, 
1964. 


For the general theory the reader can consult 
Hauoos, P., Measure Theory, Van Nostrand, Princeton, 1961. 


Another way of computing certain definite integrals, involving the ‘‘residue caleulus”’, 
is discussed in a set of exercises at the end of Chapter 12, and can be read independently 
after Chapters 8 and 11. 

For the relationship of integration to “generalized functions” see 


Geu’ranp, I. M., and G. E. SuHitov, Generalized Functions, val. 1, Academic Press, 
New York, 1964. A little complex variable theory is necessary to read this book. 


Chapters 9, 10, and 11 


For a more abstract treatment see 
Lana, 8., Introduction te Differentiable Menifolds, Interscience, New York, 1962. 


For a less abstract treatment see 


Franpers, H., Differential Forms with Applications to Physical Sciences, Academic 
Press, New York, 1963. 


FLEMING, W., Functions ef Several Variables, Addison-Wesley, Reading, Mass., 1965. 
Spivak, M., Calculus on Manifolds, Benjamin, New York, 1965. 


A more extensive treatment of differential geometry will be found in 


WInLmoreE, T., An Introduction to Differential Geometry, Oxford University Press, 
Londen, 1959. 


SELECTED REFERENCES 571 


Chapter 12 


The classical book on potential theory is 

KELLoGa, 0., Foundations of Potential Theory, Springer, Berlin, 1929. 

The relationship between harmonic functions on the plane and analytic functions is 
studicd in standard texts on the latter subject, such as 

Autrors, L., Complex Analysis, 2nd ed., McGraw-Hill, New York, 1966. 

Hi1iz, E., Analytic Function Theory, Ginn, Boston, 1959. 


Chapter 13 


A standard modern book on mechanies is 
GotpsTEIN, H., Classical Mechanies, Addison-Wesley, Reading, Mass., 1952. 


For the classica] astronomy-oriented aspect of mechanics sec 

Poincart, H., Lecons de mécanique céleste, Gauthier-Villars, Paris, 1905-1910. 

Wurrtaker, E. T., Analytical Dynamics, 4th ed., Cambridge University Press, Cam- 
bridge, England, 1937. 

A leisurely geometrical study of classical mechanics will be found in 

Lanczos, C., The Variational Principles of Mechanics, University of Toronto Press, 
Toronto, 1960. 

For a book which treats classical mechanics in the spirit of this chapter, and which 

also studies statistical mechanics and quantum mechanics, see 

Mackey, G., Mathematical Foundations of Quantum Mechanics, Benjamin, New York, 
1965. 

An elegant treatment of the geometrical applications of the calculus of variations will 

be found in 

Minor, J., Marse Theory, Princeton University Press, Princeton, 1965. 


General Conventions 


end of proof 
the real number system 
the integers 


the complex number system 


Euclidean n-space 


NOTATION INDEX 


boldface letters for #-tuplets: a == <az,..., a> 


Special Symbols 


Symbol 


a) 
Sy) 
ra 
ra) 


-~_ 
cM ON NN Oo Ee FF WwW Pe 


Symbol 


F(x, +} 
UN,U,A 
A 
5 
€([a, 5) 
Di Bey 
L(A) 
g 
ae 
Wy 
572 


Page 


11, 117, 202 
12 

12 

i2 

14, 43 

14 

16 

17 

17 

19, 53, 56 
24, 121 
27 

27 

28, 31, 74 
32 

33, 47 


Symbol Page 
N(T), NP) 34 
R(T), A(T) 34 
Hom({V, W) 45 
0; 47 
® 56 
d(V) 78 
#4 78 
¥s 82 
A° 84 
i 85, 262 
‘* 93 
A(T) 99, 312 
tr(T) 99 
Div) 101, 312 
lub, glb 119 
II II 122, 128 
He lla, Whey UL Mee 122 
B,(é) 123, 196 
@(A, W) 122, 123 
g,0, 0 138 
AF . 141, 142 
dF a 141, 142 
Da(V, W) 143 
D,F 147 
d*F 150, 186 
aFi 153 
a lier Yn 
ou avey te (a) A 
a"F. 192 
class C*, e*([a, b}) 194 
cc 194 
plz, ¥) 196 
B,(x) 196 


NOTATION INDEX 


Symbol Page 
A, Aint 197, 198 
0A 198 
P(A, B) 199 
B,{A] 212 
e(A, W) 219 
ee(A, W) 219 
(£, 9) 38, 248 
At 250 
flyALB 250 
V*@W* 305, 306 
(V)® = 307 
a” 310 
AAR 316 
a 320 
n(A) 321 
D 322 
Dain 324, 326 
CP 324 
a, B 331 
Deon 332 
By 333 
€a(x) 337 
Sp 338 
Foon 339 
Je 150, 342 
sf 351 
$ 356 
ff 357 
* 359 
H, 361 
(Us, a;) 364 
¢* 372 
D, —-:378 


574 NOTATION INDEX 


Symbol 


&(f) 
(Mf) 


Dxf 
y*[X] 
[X, ¥] 
T?(M) 
{é, 1) = U(é) 
af, df(x) 
3 
ax) 
re 
ant 


o*p 
Dxp 
div <X,p> 


Page 


3738 
374 
374 
376 
379 
384 
384 
388 
390 
391, 83 
391 


394 


157, 395 


405 
408 
408 
409 
416 
419 
419 


Symbol 


a? 

AP 

the operator d 
Dxrw 

xX Iw 
curl w 
div @ 

A 

Wy V Wa 
T(M) 
T(a) 
P*(a) 


Page 


429, 430 
430 
438 
452 
456. 
458 
458 
476 
490 


absolute convergence, of an infinite 
series, 221 

absolute value, 121, 242 
absolutely integrable function, 351 
adjoint of 7, 85 
affine independence, 81 
affine span, 56 
affine subspace, 40, 52 
affine transformation, 53 
algebra, 30, 46 

Banach, 223 

of matrices, 92 
alternating tensor, 310 
amplitude, 292 
analytic plane R?, 10 
angular momentum, 523 
annihilator, 84 
antisymmetric tensor, 310 
associative law, for composition, 14 
atlas, 364 


ball, closed, 124 

open, 123, 196 
Banach algebra, 223 
Banach space, 217 
Banach-Tarski paradox, 321 
basis, 71 

dual, 82 

in Hom(V, W}, 96 

infinite, 75 

for pre-Hilbert space, 254 
basis isomorphism, 72 
Bessel’s inequality, 254 
bijective, 12 
bilinear, 67 
bilinear functionals, 305ff 
block decompositions, 65ff 


INDEX 


bound variable, 2 

boundary, 198 
boundary-value problem, 294ff 
bounded below, 128 

bounded function, 122 
bounded linear mapping, 127 
hounded set, 123 

braces, 7 


calculus of variations, L82ff 
canonical basis, 104 
canonical transformation, 558, 560 
Cartesian axes, 38 
Cartesian produet, of an indexed 
collection, 14 
of two sets, 9 
Cauchy sequence, 216 
cellulation, 473 
center, of gravity, 349 
of mass, 529 
central foree, 523, 564 
centrifugal force, 517 
centrifugal potential, 517 
chain rule, 153 
change of basis, $4 
change of variable formula, 342, 355 
characteristic function, Fourier 
transform, 337 
of a set, 12 
characteristic polynomial, 259 
classical mechanies, 509 
closed set, 124, 197 
closure, 198 
codimension, 80 
codomain, 12 
cofactor, 315 
column space (of a matrix), 91 


575 


576 INDEX 
column vector, 93 
commutative diagram, 86 
compact transformation, 264 
compactness, in general topology, 214 
on a manifold, 403 
sequential, 206 
compatible atlases, 370 
complement, i7 
complementary subspaces, 57 
completeness, 217 
completion, 222 
complex conjugation, 241 
complex normed linear space, 244 
complex numbers, 240ff 
complex vector space, 24, 241 
complexification, 244 
component, 58 
composite-function rule, 143 
composite truth-functional forms, § 
composition, 14 
configuration space, 509 
conjugate space, 82 
connectives, logical, 3 
conservation, of angular momentum, 523 
of energy, 521 
laws, 521 
constant coefficient equation, 278 
constrained maximum, I73f 
content, 332 
contented function, 339 
contented sets, 331f 
continuity, 126, 196, 201, 203 
contraction mapping, 229 
contravariant vector, 95 
convergence, of infinite series, 221 
sequential, 202 
convex set, 125 
coordinate, 43, 72 
coordinate correspondence, 36f 
coordinate functional, 33, 72 
coordinate isomorphism, 72 
coordinate n-tuple, 72 
coordinate projection, 43 
correspondence, 9 
coset, 52 
cotangent bundle, 511 
covariant tensor, 307 


covariant vector, 95 

Cramer’s rule, 101, 315 

critical point, 161, 188ff 

cylindrical coordinates, 345 (Ex. 11.5) 


definite (form), 115 
degree (of a polynomial), 64 
De Morgan’s law, 17 
dense subset, 212 
density, 408 
dependent set of vectors, 73 
derivative, 146 
in @ Banach algebra, 226 
over C, 242 
directional, 147 
determinant, 99ff, 108, 312ff 
diagonal matrix, 262 
diffeomorphism, 376 
differentiable manifolds, 363, 369ff 
differential, 142 
differential forms, 390f 
differential geometry, 459 
differentiation under the integral sign, 
181 
dimension, 78 
dimensional identities, 78f, 84 
direct sum, 56 
directed frame, 9 
directed line segment, 37 
directional derivative, 147 
disjoint (family of sets), 18 
distance, in a metric space, 196 
between vectors, 122 
divergence theorem, 42i 
division algorithm, 64 
domain, with almost regular boundary, 
424 
with regular boundary, 419 
domain, of a relation, 9 
of a variable, 8 
dual basis, 82 
dual space, 82 
duality, 15, 68 
dyad, 86 


echelon matrix, 104 
eigenvalue, 34, 258 


eigenvector, 34, 258 

element (of 2 set), 6 

elementary bilinear functionals, 306 
elementary matrices, 105f 
elementary row operations, 103 
elliptical coordinates, 566 

empty set, 7 

equations of variation, 513f 
equicontinuity, 215 

equivalence relation, 19 

equivalent directed line segments, 37 
equivalent norms, 132, 204 
equivalent truth-functional forms, 5 
Euclidean norm, 38, 122 

Euler angles, 548 

Euler's equation(s), 184, 541, 543 
existential quantification, 2 
exponential function, 226, 228 
exterior algebra, 316ff 

exterior calculus, 429 

exterior differential forms, 429 
exterior forms, 316 


fibering, 19 

finite-dimensional, 28 

first variation, 183 

fixed-point theorem, 229ff 

floor of a paying, 329 

flow, 381 

formal adjoint, 295 

Fourier coefficient, 254 

Fourier series, 254, 301 ff 

Fourier transform, 355, 357 

frame, 71 

free variables, 2 

frequency, 292 

function, 10f 

function space, 24 

functional dependence, 175ff 

fundamental form on T*(A4), 515 

fundamental solution, 277 

fundamental theorem, of algebra, 243 
of calculus, 238 


of ordinary differential equations, 267 


geodesic, 539 
geodesic coordinates, 537 


INDEX 


geodesic curvature, 468 
geodesic polar coordinates, 470 
germ of functions, 138 

global solutions, 269 

graph, 9, 162 

greatest lower bound, 119 
group, 15 


Hamilton-Jacobi equation, 563 
Hamiltonian mechanics, 520 
Hamiltonian structure, 559 
Hamiltonian vector fields, 519, 559 
Heine-Borel property, 214 
Hermitian symmetry, 249 

Hilbert space, 249 

homogeneity, of a norm, 121 
homogeneous equation, 277, 282ff 
homogeneous function, 148 
homogeneous polynomial on V, 145 
hyperplane, 41 


idempotent, 59 

if...,then...,4 

image, 10 

immersion, 399 

implicit-function theorem, 167, 230 
inclusion (set), 7 

increasing norm on R", 134 
increasing sequence, 207 
independent set of vectors, 71f 
independent subspaces, 56 

indexed collection, 13 

inertia tensor, 546 

infinite series, 221, 224ff 
infinitesimal, 137 

infinitesimal generator, 379, 381 
inhomogeneous equation, 277, 288ff 
initial condition, 267 

injection (product), 47 

injective, 12 

inner content, 331 

inner paving, 331 

integral, of a parametrized are, 236ff 
integration, 321, 336 

interior, 197 

interior point, 124 
intermediate-value theorem, 120 


577 


BeatriceGloria_personal library 
578 INDEX 


intersection, 17 
invariant subspace, 54, 63 
inverse, of a function, 15 

of a relation, 9 
inverse-mapping theorem, 167 
isomorphism, 34 

between normed linear spaces, 132 
iterated integrals, 346 


Jacobian determinant, 159 
Jacobian matrix, 158f 


Kepler’s first law, 527 
Kepler’s second law, 524 
Kepler’s third law, 527, 582 
kernel, 34 

kinetic energy, 521 
Kronecker delta function, 74 


Lagrange multipliers, 174 
Lagrange’s equations, 530 
law of cosines, 41, 250 
least upper bound property, 119 
left inverse, 14 
lemma of Du Bois-Reymond, 184 
length of a curve, Riemannian manifold, 
401 
Lie bracket, 389 
Lie derivative, of a density, 417 
of an exterior differential form, 431, 
452 
of a function, 383f 
of a linear differential form, 392 
of a vector field, 385, 388 
limit, of a function, 116, 126 
line (straight), 38 
line of nodes, 548 
linear combination, 27 
linear combination mapping, 31 
linear diffcrential equations, 2764 
linear functionals, 32, 82 
linear transformation, 30 
Lipschitz continuity, 126 
loeal solution, 269 
locally absolutely integrable density, 
408 
locally absolutely integrable n-form, 435 
locally uniformly Lipschitz, 267 


mapping, 12 

matrix, 32, 88 

maximal solution, 269 

maximum, relative, 161 

mean-value theorem, 148f 

member (of a set), 6 

metric space, 196 

minima] polynomial, 261 

momentum, 511 

momentum function of a vector field, 
517 (Eq. 3.6) 

monomial on V, 145 

monotone sequence, 207 

multi-index notation, 193 

multilinear functionals, 306ff 

multivector, 318 


name, 7 
natural isomorphism, 69, 86 
negation, 6 
neighborhood, 137, 201 
deleted, 137 
nilpotent, 50 
nonnegative transformation, 257 
nonsingular, 93 
norm, 121 
norm metric, 196 
normal distributions, 355 (Ex. 13.1), 356 
normed linear space, 121 
nth-order equation, 270ff 
null space, 34 
nutation, 550 


one-parameter group, 
one-to-one, 11 

open set, 124, 196 

or (connective), 4 

ordered basis, 71 

ordered a-tuple, 13 

ordered pair, 9 

orthogonal, 250 

orthogonal projection, 253ff 
orthogonal transformation, 262 
orthonormal, 254 
orthonormal basis, 112, 254 
oscillating system, 291ff 
outer content, 331 

outer paving, 331 


INDEX 579 


pair set, 7 real vector space, 23f 
paratlel translation, 40 rectangles, 323f 
parallelogram law, 251 regular differential equations, 282, 298 
parametric equations (of a line), 38 regular solid, 473 
parametrized are, 146 relation, 9 
Parseval’s equation, 254 Rellick’s lemma, 362 (Ex. 14.30) 
partial derivative, 156f resonance, 294 
partial differentia), 153 restricted variables, 8 
partition, 19 restriction (of a relation}, 10 
partition of unity, 405 Riemann metric, 397 

subordinate to an open covering, 406 right inverse, 14, 61 
patch manifold, 172 rigid-body motion, 544 
paved function, 338 row-reduced echelon form, 104 
paved set, 324, 326 row space (of a matrix), 91 
paving, 324 row vector, 93 
permutations, 308 
phase angle, 294 sealar, 28 
plane, 40 scalar product, 38, 248 
Poisson bracket, 519 Schwarz inequality, 125, 249 
polar coordinates, 345 second conjugate space, 83 
polar decomposition, 263 second difference, 188 
polynomials, 62, 64 second differential, 150, 186ff 
positive definite, 115, 190 self-adjoint, 257 
precession, 547 seminorm, 125 
pre-Hilbert space, 249 semiscalar product, 248 
principle of mechanical similarity, 532 separation of variables, 563f 
product, matrix, 91 sequential convergence, 202ff 
product norm, 133 set, 6 
product rule (for differentials), 155 shear transformation, 56 
product vector space, 43 sign of a permutation, 309 
projection, 19, 58 signature (of a quadratic form), 113 
projection-injection identities, 47 simple harmonic motion, 293 
property, 7 skeleton, 32 
pullback, of densities, 416 small oscillations, 55! 

of exterior differential forms, 392 smooth are, 146 

of functions, 372 Sobolev’s inequality, 361 

of linear differential forms, 392 solid angle, 474 

of vector fields, 384 span, linear, 28 
Pythagorean theorem, 41, 251 spherical coordinates, 345 (Ex. 11.4) 

standard basis, 74 
quadratic form, 112 star operator, 320 
quotation marks, 7 state, 510 
quotient space, 54 statement, 1 
statement frame, 1 

range, Of a linear transformation, 34 stop function, 237 

of a relation, 9 Sturm-Liouville system, 298 
rank, of a linear transformation, 85 submanifold, 172, 367 


of a matrix, 91 subsequenee, 205 


580 INDEX 


subset, 7 two-body problem, 528 
subspace (vector), 24 two-norm, 250 
successive integration, 346 
support, of a density, 408 undetermined coefficients, 289 

of a differential form, 425 uniform conditions, 210ff 

of a function, 336 uniform continuity, 179, 210 
surjective, 12 uniform convergence, 211 
symmetric tensor, 310 uniform norm, 123 

uniformly absolutely integrable, 353 

tangent bundle, 511 union, 17 
tangent plane, 162 unit set, 7 
tangent space, 373 universal quantifier, 1 
tangent vector, 146, 373 upper triangular, 66 
tautology, 5 
Taylor’s formula, 191ff yariation of parameters, 289 
tensor product, 305f variational principles, 532 
topology, 201 vector analysis, 457 
total linear momentum, 528 vector field, 44 
totally bounded, 212 vector space, 23 
trace, 99 volume density of a Riemann metric, 411 
translation, 40, 53 volume of an immersed hypersurface, 
transpose, 90 413, 415 
triangle inequality, for metries, 196 

for norms, 121 weak sequential convergence, 245 
truth-functional forms, 5 wedge operation, 316 


truth table, 3 Weierstrass approximation theorem, 304 


